Database Benchmarks Lie (If You Let Them)
11 points
1 hour ago
| 3 comments
| exasol.com
| HN
exagolo
1 hour ago
[-]
Traditional database benchmarks focus on throughput and latency – how many queries per second can be processed, how execution time changes as hardware resources increase. This benchmark revealed something different: reliability under realistic conditions is the first scalability constraint.
reply
dataDominSA
1 hour ago
[-]
This is the article I wish existed when we were evaluating platforms. "Reliability under realistic conditions is the first scalability constraint". Speed means nothing if queries don't finish.
reply
hero-24
1 hour ago
[-]
From my experience, planning is often the first headache I have to deal with (join order, hash sizing, operator choice), before concurrency and memory even come into play.
reply
exagolo
1 hour ago
[-]
You mean the "execution plan" for your queries? Ideally, those types of decisions are automatically done by the database.
reply
hero-24
10 minutes ago
[-]
ideally? yes. in practice? big nope.

How you actually interpret what you're seeing here? does it look like more like optimizer fragility (plans that assume ideal memory conditions) or more like runtime memory management limits (good plans, but no adaptive behavior under pressure)?

reply
exagolo
6 minutes ago
[-]
I think the issue in the tests was the lack of a proper resource management of Clickhouse that led to queries failing under pressure. Although I have to admit that the level of pressure was minimal. Just a few concurrent users shouldn't be considered pressure. Also, having far more RAM than the whole database size means very little pressure. And the schema model is quite simple, just two fact tables and a few dimension tables.

I think any database should be able to handle 100 concurrent queries robustly, even if this means to slow down the execution of queries.

reply