I don't want to commit to a real opinion, but the cynic in me sees a bitter lesson you could take from this is that the database should default to a low isolation level--the damn developers aren't even using transactions right, so why waste performance handling transactions in the strictest possible way?
Sure, sometimes it's too slow, but it should be the default.
Very few people can write correct database code at the other serialization levels. Most think they can, but it's harder than correct multi-threading, because databases do weird unintuitive things for performance.
Naively one would expect that no individual UPDATE or INSERT statement could deadlock in isolation… but there you go.
If that is possible, the possibilities across multiple concurrent data mutating queries are beyond human comprehension!
Serializable should absolutely be the default!
Similarly, all columns should be automatically indexed to at least some degree, like Postgres BRIN indexes at a minimum.
Time and experience have shown that the vast majority of developers are pathologically unable to properly define all required indexes ahead of time.
That being said also "required indexes ahead of time" is impossible - because query patterns change and things get released unless you mean something like "the required indexes for our obvious query patterns we just freaking released.
I dont think most columns need to be indexed because that is mostly nonsense, most columns never have a where clause applied to them or are sorted on themselves, so an index provides ??? value.
As a consultant I come across a lot of CotS software, in-house or otherwise bespoke software, etc. Roughly 40% of the former has 100% of the minimum required number of indexes and approximately 5% of the latter. By "minimum", I mean the indexes required to avoid full scans of tables that will become large enough for this to be a problem in production.
"Disciple doesn't scale." is one of my favourite sayings now, for a reason!
1) Developers almost always work with toy data, and are hence insulated from poor indexing decisions. Problems turn up 'x' years from now. It is well established that humans learn poorly when consequences are delayed... by mere hours, let alone years!
2) DBAs and developers often have an adversarial relationship. A common consequence of this workplace dynamic is that developers aren't granted the required access to tune indexes, especially in production, which is where the issues manifest.
3) I've heard anecdotes, including here, along the lines of "XYZ cloud native / webscale database is so much faster than ABC traditional RDBMS!". Very often the difference is just that XYZ auto-indexes by default. CosmosDB, Google Firestore, Kusto, Elastic, Druid, and many columnar formats are in this category of "magically" faster!
I'm now 99% convinced that RDBMS needs to be reinvented for the modern fast-paced, vibe-coded, "I'm a fullstack(lol) dev" world where people simply don't have the bandwidth to pay attention to minutiae like on-disk sort order and filtered secondary indexes. A better fit for today's world would be a system that is: columnar by default like SAP HANA, compressed[1] by default, indexed by default (thanks to being columnar!), serializable by default, and "include batteries" like native queue capabilities so that nobody has to figure out cross-RDBMS complications like distributed transactions, outbox patterns, or deal with the consequences of a DBA rolling back one of two databases to a backup.
Big point: Serializable not exist alone in a decent ACID datastore, and no, less strict rules for the MOST important thing you have(your data) is NOT a good idea.
Over and over again Acid RDBMS have proven that trying to "relax" the rules in pursuit of performance or worse, mystical holy grails that have never been right or correct for a primary datastore, is a mistake. And then people goes back to them, because is the best tool for ALL the primary data store jobs. ALL OF THEM.
Is like the mythical C developer that "not need safety", that at least has more chance of be possible (after MANY passes over the code) that a datastore without safeguards.
but ignoring that, serializable isolation level means the database acts AS IF the transactions are serial. but most databases in fact will execute them concurently, with careful tracking to make sure they appear serial
Sometimes though, yes definitely. It's hard to claim anything universal at all about databases.
(Unless you mean "being able to choose between different isolation levels", then yes, completely agreed. Very very few use anything but the default, somewhere below serializable, and it always concerns me unless they can describe exactly what they're intentionally allowing and why it's okay for their system. Most cannot.)
As an example: Oracle and PostgreSQL don't have dirty reads: READ UNCOMMITTED does nothing. MySQL's concurrency model depends on the engine.
mybalance = database.read("account-number")
newbalance = mybalance - amount
database.write("account-number", newbalance)
dispense_cash(amount) // or send bitcoins to customer
and MongoDB didn't even have a way to do this atomically? An RDBMS with read-committed would handle this fine if you did "read for update" on that row.This is a rabbit hole worth going down, but it shouldn't be the default. This is a classic case of Chesterton's Fence.
I expected the article to substantiate the claim that serializable brings a large performance hit as in my experience it isn't so. The article basically makes the same point.
With serializable, you need to be a little careful not to have hot rows. Avoid them by sharding commonly written values. Another way to improve performance is to use true time for ordering non read-then-write transactions. It's a little finicky if the database doesn't provide such guarantees out of the box. Take Google's Spanner as an example. It offers the serializable isolation level and it's pretty performant (as long as you account for hot spots).
Unfortunately, serializable isolation requires detecting or preventing read-write conflicts (i.e. one transaction writing a row that a concurrent transaction has read). This is the performance impact of serializability: you need to be very careful what you read, because if you read too many rows you prevent any concurrent transactions from updating those same rows. Read-only transactions are OK (because MVCC), and read-only tables are OK (because there's no read-write conflict if a table is mostly read only), but tables that are both written and heavily read are where you get performance problems.
With snapshot isolation (e.g. Oracle's serializable, Postgres repeatable read), only write-write conflicts matter. There it doesn't matter what a transaction reads, and reads never need to block (or abort) writers. So what you say is true for snapshot, but not for serializable.
Interestingly, serializable's lack of need to detect write-write conflicts means that (in some implementations) it can be faster than snapshot for blind writes (i.e. anything that's not a read-modify-write under the covers).