Ten years of ClickHouse in open source
112 points
3 days ago
| 11 comments
| clickhouse.com
| HN
drchaim
2 hours ago
[-]
I discovered ClickHouse around 2017-18 and built a PoC to replace Elasticsearch: 5x better storage and qps, in a couple of weeks.

Managers rejected it because it wasn't well known and was seen as "some database made by Russians."

On a personal level, it's quite sad to have seen that train coming so early and not been able to get on board.

reply
ashu1461
2 hours ago
[-]
Same we are also stuck with ES wish could migrate to clickhouse but not able to do so because of the legacy load.
reply
himata4113
3 hours ago
[-]
ClickHouse recently has been a breath of fresh air compared to using timescaledb for a long time. Although psql is the greatest there is and I really enjoyed the fact that I could rely on a single database system to run everything, when it came to migration maintenance and deployment it's really a pain and it also feels like development on timescaledb is a bit wishy washy with all the structural changes from version to version and it really feels like an alpha product sometimes.
reply
k_bx
1 hour ago
[-]
I was using TimescaleDB some very long time ago, things have changed quite a lot since (it's now even named differently).

In my current setup I was thinking on doing both: upgrading postgresql to timescaledb (to archive old data etc.), and to deploy ClickHouse in parallel. I'm still considering whether to go big on PeerDB to get ClickHouse mirror or just deploy it separately without additional fragility layer.

Would you not recommend using timescaledb at all? I definitely want to avoid alpha-quality software pain, since PostgreSQL is one of the most rock-solid parts of the stack at the moment.

reply
himata4113
42 minutes ago
[-]
I would just run both and decomission the old one when a) all data is migrated, b) old data is no longer relevant and can be archived
reply
__s
1 hour ago
[-]
Worked on peerdb. If you're able to batch changes on your end & push to both postgres & clickhouse, do that. Only move to peerdb when you know you need cdc
reply
jaysh
3 hours ago
[-]
ClickHouse replacing Loki finally made our observability stack feel 'right'. It really is a powerhouse for logs and general analytical queries.
reply
oulipo2
2 hours ago
[-]
How do you use it for visualization? Do you use ClickStack? or something else?
reply
jaysh
6 minutes ago
[-]
Still via Grafana. I ran it side-by-side with Loki and despite trying to optimise Loki and using ClickHouse out of the box - it really was shocking how much faster ClickHouse was for every single query (e.g. in the last 12 hours give my the frequency of logs with a particular JSON event or even "find this log entry, then join back and find the number of times a different entry appears within the same correlation_id)
reply
usrme
7 minutes ago
[-]
Same question here!
reply
spprashant
1 hour ago
[-]
If your data is too big for postgres, it seems like moving straight to Clickhouse is the best option. We have been through an whole array of distributed database technologies, and Clickhouse might be first one that doesn't have too many compromises.
reply
brunojppb
1 hour ago
[-]
Clickhouse has been a game changer for some of the companies i have worked in the past. This reminds me of this podcast episode (1) from the Rust in Production pod about their Rust adoption.

1. https://open.spotify.com/episode/0TBKDUhO0KihBxEzZqnQx1

reply
orta
3 hours ago
[-]
I've been using clickhouse for the last year for in-house analytics and found it a really pleasant experience, thanks for all the progress you've made
reply
dandellion
3 hours ago
[-]
Same. We replicated some data from Postgres, it was easy to set up, similar enough that the transition was trivial, and really good performance out of the box. One of those good "use the right tool for the job" experiences.
reply
lazyasciiart
2 hours ago
[-]
> You can open a pull request as an experiment, without aiming for it to be merged - it will be tested with the same level of scrutiny as production releases. Found a new memory allocator, a new compression library, a new hash table, a data format, or a sorting algorithm? - bring it to ClickHouse, and it will expose it inside-out

Wow

reply
benjamkovi
1 hour ago
[-]
ClickHouse dev here, but this is true. ClickHouse contributed finding several bugs on our third-party libs (jemalloc, librdkafka for 100%, there much more, but I only worked on these), in linux kernel and basically everywhere. We have very rigorous fuzzers (yes, multiple fuzzers on multiple levels), running tests in insane number of configurations. I think the last number I heard a year ago is around 400 hours for a complete CI run for a single commit (not PR, but commit). So yeah, pretty insane, in the good way.
reply
baq
3 hours ago
[-]
clickhouse is the low key amazing tech people are busy using instead of posting about. keep it up!
reply
Talpur1
2 hours ago
[-]
10 Years! quite a long journey, specailly observeability part is need of hour
reply
ddorian43
3 hours ago
[-]
Clickhouse is *really* gatekeeping the "zero copy replication" where you store data on object-storage and have high availability from the open source version.
reply
pepperoni_pizza
40 minutes ago
[-]
I think that is just the nature of the open core business - but like most such businesses, they're not very clear about how that is what they are, pretending to be open source business instead.
reply
orian
21 minutes ago
[-]
This is the main driver for their cloud ;-)
reply
nvartolomei
1 hour ago
[-]
How? Have you tried contributing a reasonable implementation with test coverage and it was rejected?
reply
haeseong
1 hour ago
[-]
The query speed deserves the praise, but the JSON ingestion path has quiet footguns nobody mentions here. Every numeric column comes back as a string over JSONEachRow, so a forgotten Number() cast silently turns arithmetic into string concatenation, and with input_format_skip_unknown_fields enabled a single typo in a column name drops that field with no error at all. Worth wiring an assertion that inserts a row and reads it back into CI before trusting the dashboards.
reply
charrondev
24 minutes ago
[-]
We’ve done our JSON ingestion by keeping a schema in the app for all the types we expect, and injecting the types into the query builder.

Then as needed we have materialized columns on our different tables.

reply