Another huge benefit that we're realizing as we move some of our heaviest tables to this pattern is that it makes it really easy to index a core set of fields in Elasticsearch, along with a tag to associate it with a particular domain model. This has drastically cut down on our search-specific denormlization and let us avoid expensive index schema updates.
Edit: I guess the main difference is that it's just calculating separate sets and then merging them, which isn't really DeMorgan's, but a calculation approach.
There _are_ optimizers that try to represent this kind of AND/OR expression as sets of distinct ranges in order to at least be able to consider whether a thousand different index scans are worth it or not, with rather limited success (it doesn't integrate all that well when you get more than one table, and getting cardinalities for that many index scans can be pretty expensive).
Has anyone put machine learning in an SQL query optimizer yet?
Query optimizers definitely try to estimate cardinalities of joins. It's a really, really hard problem, but the typical estimate is _much_ better than “eh, no idea”.
The generalized guidance without even mentioning database server as a baseline, without showing plans and/or checking if the database can be guided is just a bad teaching.
It looks like author discovered a trick and tries to hammer everything into it by overgeneralizing.
I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?
Is there another optimization I'm missing?
The article links to my code for my timings here: https://github.com/ethan-seal/ors_expensive
There is an optimization that went in the new release of PostgreSQL I'm excited about that may affect this - I'm not sure. See [https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
Different databases will use similar terms for different operations, but I would guess that the comment refers to something similar to MySQL's index merge (which is essentially reading the row IDs of all the relevant ranges, then deduplicating them, then doing the final scan; it's similar to but less flexible than Postgres' BitmapOr).
I find query planning (and databases in general) to be very difficult to reason about, basically magic. Does anyone have some recommended reading or advice?
A good table has the right indexes, and a good API to deal with the table is using the RIGHT indexes in it's criteria to get a good result.