Pick Your Distributed Poison
104 points
12 days ago
| 7 comments
| hazelweakly.me
| HN
rubiquity
8 days ago
[-]
I don’t agree with much in this writing other than that eventual consistency is a bad choice. Distributed systems are hard but in 2024 there are enough known patterns and techniques to make them less icky. Systems built on total ordering are much more tractable than weaker protocols. Mahesh Balakrishnan’s recent paper[0] on the Shared Log abstraction is a great recipe, for example.

As an aside, I’ve never enjoyed the defeatist culture that permeated operations and distributed systems pop culture, which this post seems to re-enforce.

0 - https://maheshba.bitbucket.io/papers/osr2024.pdf

reply
from-nibly
8 days ago
[-]
I think the defeatism comes from practicality. I'm at a 30 person IT org trying to do distributed systems and eventual consistency. Dont take on complexity unless you have to. And eventual consistency requires a LOT of scale before it becomes a have too.
reply
delusional
8 days ago
[-]
I dont think it's necessarily a question of scale. Where I work, we have a lost of strategic partnerships, and all those partners have their own it systems with their own master data. It's intractable to enforce strong consistency between all of these disparate systems that don't speak to one another, and you expressly dont want to take the whole substrate offline when a single parter has a network issue. The best you can do is really eventual consistency.
reply
from-nibly
8 days ago
[-]
I guess im talking about internal eventual consistency. Im not arguing against using amother system outside your own single instance database.
reply
ffsm8
8 days ago
[-]
> I don’t agree with much in this writing other than that eventual consistency is a bad choice

Does it really matter wherever is bad or not? As far as I know, every database that scales beyond a single node (for performance) is eventually consistent. Otherwise you've gotta wait for a sync between nodes before a response can be given, which would effectively force your cluster to have worse performance then running on single node again.

reply
jashmatthews
8 days ago
[-]
There are a bunch that offer strong consistency e.g. Cloud Spanner and DynamoDB.
reply
mping
8 days ago
[-]
Don't forget FoundationDB
reply
sriram_malhar
8 days ago
[-]
> As far as I know, every database that scales beyond a single node (for performance) is eventually consistent

That's SO not true. Spanner, Amazon's S3 are some of the biggest databases on the planet, and they are strongly consistent.

> Otherwise you've gotta wait for a sync between nodes before a response can be given, which would effectively force your cluster to have worse performance then running on single node again.

Yes, you are trading latency for fault-tolerance, but so what? What if the resulting latency is still more than good enough? There is no shortage of real large-scale applications where this is the case.

reply
walterbell
8 days ago
[-]
> Reproducible and bootstrappable systems get a lot of love among neurodivergent people. For good reason: they’re very friendly to those with little working memory but vast amounts of working context.

This is novel yet obvious in retrospect, has it been articulated or surveyed elsewhere?

reply
nyanpasu64
8 days ago
[-]
What's the distinction between working memory and working context?
reply
glic3rinu
8 days ago
[-]
I was wondering about that too. Compared to my peers I consider myself having a below average short term memory (not really good at raw computation power), but in contrast, I am pretty good at abstract thinking, e.g. I can easily visualize the end2end flow of a system, identify all the pain points and produce very good designs (in terms of simplicity). So perhaps this "working context" is about knowledge, pattern recognition, analytical thinking, ...
reply
kukkeliskuu
8 days ago
[-]
Typically there are no guarantees that a distributed system with ”eventual consistency” is is in consistent state as a whole at any point in time. Downstream systems need to be built based on the assumption that the system is perpetually inconsistent.
reply
GeneralMayhem
8 days ago
[-]
That's true in the general case, but there are usually ways to make it not matter.

If you have a vector clock with snapshot reads, then you may not know consistently what the system looks like now, but you can know what it looked like as of some arbitrary previous point in time. If your use case doesn't require read-modify-writes (e.g., metric calculations where the source of truth is out of your control and you just need to aggregate deltas), then that's good enough - you can find a set of results that are in sync with each other, even if they're not fully up to date.

If you have limited read-modify-writes, but you can at least engineer synchronous transactions on subsets of the data (e.g., single-row-level transactions) with asynchronous replication and fan-out of side-effects, then you can turn more complex problems into something that looks like the first case.

Those tricks alone are enough to remove most of the weirdness with eventual-consistency, but the broader point is that distributed system design is more about learning a toolbox of tricks to handle and describe certain patterns, of which you may have any number in any combination in any given project, than it is about "solving" the problem once and for all.

reply
kukkeliskuu
6 days ago
[-]
Sounds plausible. I am fighting a different battle, however.

For my corporate customers, what you are describing is way too sophisticated. They do what others do. As Kafka is de facto messaging platform, everybody uses it.

For example, if source data is multiplexed via Kafka into one sink, people typically say that the sink is "eventually consistent". Another case is when an operation writes into multiple microservices without transactions. People are not even consistent in their speech, let alone how they handle their data.

It seems to me that in 99% of such corporate systems would work better if people would optimize integrity (i.e. communicate with real transactions on a real database) instead of optimizing for performance and scalability.

reply
tazu
8 days ago
[-]
I'm having a hard time understanding what this is about. Is it about Kubernetes tradeoffs?
reply
NAHWheatCracker
8 days ago
[-]
It's just about tradeoffs in system design, nothing as specific as K8s.

I think the author just wanted to write a glib little blog post.

reply
ajb
8 days ago
[-]
It's about the tradeoffs between guarantees in distributed systems generally.

But I don't really think the allusive writing style is a good fit for tech. Actually it's pretty annoying even with non tech. YMMV. Here, it's hard to understand the post without already knowing most of what it's saying.

Of course, it's a blog post, the author doesn't owe anyone anything.

reply
User23
8 days ago
[-]
Distributed systems, like interrupts, are something you can’t bullshit and get right. The author is right though that nondeterminism makes systems easier to reason about if you have the right cognitive tools (ie, math).

Here[1] is an example of what rigorous reasoning about distributed systems looks like.

  the non-deterministic algorithm emerges when, abstracting from their mutual differences, we concentrate on what the many algorithms of the class have in common.

 
[1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD687...
reply
throw1234xxdgv
8 days ago
[-]
I agree completely and would even say this is the case with everything concurrent (interrupts, distributed systems, IPC/threads etc.)
reply
jiveturkey
8 days ago
[-]
Is this a lay description of CAP theorem? Or is it actually broader than that? I'm having a hard time getting my head around it.
reply
Jenk
8 days ago
[-]
I think it is CAP theorem and perhaps the author isn't familiar with the name?
reply
mlhpdx
8 days ago
[-]
That’s really the secret, isn’t it? Deciding, deliberately, what to leave out of a system so that it is possible for the people involved to maintain it. That is perhaps a good definition of software architecture.
reply
ta_1138
8 days ago
[-]
And yet, many an architecture review is all about a group of people trying to justify their existence by adding features. I was in a design review this week where 4 different architects were involved, along with the expected assortment of managers and product people. Every single suggestion was an addition. Having written systems like the one in question for decades, I considered the initial proposal as massively overengineered in the first place: An excuse to have a project that lets someone get upleveled in the next review cycle.
reply