I suppose we need a new rule, "Any sufficiently successful data store eventually sprouts at least one ad hoc, informally-specified, inconsistency-ridden, slow implementation of half of a relational database"
PS: I've worked at Elastic for a long time, so it is fun to see the arguments for a young product.
Problems that we faced by using elastic search: High load, high Ram usage : db goes down, more ram needed. Luckily we had ES experts in infra team, helped us alot.(ecommerce company)
To Write and read after, you need to refresh the index or wait a refresh. More inserts, more index refreshes. Which ES is not designed for, inserts become slow. You need to find a way to insert in bulk.
Api starts, cannot find es alias because of connection issue, creates a new alias(our code did that when it cant find alias, bad idea). Oops whole data on alias is gone.
Most important thing to use ES as main db is to use "keyword" type for every field that you don't text search.
No transaction: if second insert fails you need to delete first insert by hand. Makes code look ugly.
Advantages: you can search, every field is indexed, super fast reads. Fast development. Easy to learn. We never faced data loss, even if db crashed.
They messed up a $30 million dollar project big time at a previous company. My cto swore to never recommend them
Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.
Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.
So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.
I agree.
Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.
ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.
Best example is IoT marketing, as if it can handle the load without bazillion shards, and since when does a text engine want telemetry
Feel like the christmas story kid --
>simplicity, and world-class performance, get started with XXXXXXXX.
A crummy commercial?
Which is why you supply the parameter
refresh: ”wait_for”
in your writes. This forces a refresh and waits for it to happen before completing the request.”schema migrations require moving the entire system of record into a new structure, under load, with no safety net”
Use index aliases. Create new index using the new mapping, make a reindex request from old index to new one. When it finishes, change the alias to point to the new index.
The other criticisms are more valid, but not entirely: for example, no database ”just works” without carefully tuning the memory-related configuration for your workload, schema and data.
I've seen some examples of people using ES as a database, which I'd advise against for pretty much the reasons TFA brings up, unless I can get by on just a YAGNI reasoning.
I would never. Ever. Bet my savings on ES being stable enough to always be online to take in data, or predictable in retaining the data it took in.
It feels very best-effort and as a consultant, I recommend orgs use some other system for retaining their logs, even a raw filesystem with rolling zips, before relying on ES unless you have a dedicated team constantly monitoring it.
- No edge-case is thrown at them
- No part of the system is stressed ( software modules, OS,firmware, hardware )
- No plug is pulled
Crank the requests to 11 or import a billion rows of data with another billion relations and watch what happens. The main problem isn't the system refusing to serve a request or throwing "No soup for you!" errors, it's data corruption and/or wrong responses.
Now I work for a company whose log storage product has ES inside, and it seems to shit the bed more often than it should - again, could be bugs, could be running "clusters" of 1 or 2 instead of 3.
Turns out running complicated large distributed systems requires a bit more than a ./apply, who would have guessed it?