FilterHN

Pick Your Distributed Poison

104 points

by mooreds

10 months ago

| past

| 7 comments

| hazelweakly.me

| HN

▲

rubiquity

10 months ago

[-]

I don’t agree with much in this writing other than that eventual consistency is a bad choice. Distributed systems are hard but in 2024 there are enough known patterns and techniques to make them less icky. Systems built on total ordering are much more tractable than weaker protocols. Mahesh Balakrishnan’s recent paper[0] on the Shared Log abstraction is a great recipe, for example.

As an aside, I’ve never enjoyed the defeatist culture that permeated operations and distributed systems pop culture, which this post seems to re-enforce.

0 - https://maheshba.bitbucket.io/papers/osr2024.pdf

▲

from-nibly

10 months ago

[-]

I think the defeatism comes from practicality. I'm at a 30 person IT org trying to do distributed systems and eventual consistency. Dont take on complexity unless you have to. And eventual consistency requires a LOT of scale before it becomes a have too.

▲

delusional

10 months ago

[-]

I dont think it's necessarily a question of scale. Where I work, we have a lost of strategic partnerships, and all those partners have their own it systems with their own master data. It's intractable to enforce strong consistency between all of these disparate systems that don't speak to one another, and you expressly dont want to take the whole substrate offline when a single parter has a network issue. The best you can do is really eventual consistency.

▲

from-nibly

10 months ago

[-]

I guess im talking about internal eventual consistency. Im not arguing against using amother system outside your own single instance database.

▲

ffsm8

10 months ago

[-]

> I don’t agree with much in this writing other than that eventual consistency is a bad choice

Does it really matter wherever is bad or not? As far as I know, every database that scales beyond a single node (for performance) is eventually consistent. Otherwise you've gotta wait for a sync between nodes before a response can be given, which would effectively force your cluster to have worse performance then running on single node again.

▲

jashmatthews

10 months ago

[-]

There are a bunch that offer strong consistency e.g. Cloud Spanner and DynamoDB.

▲

mping

10 months ago

[-]

Don't forget FoundationDB

▲

sriram_malhar

10 months ago

[-]

> As far as I know, every database that scales beyond a single node (for performance) is eventually consistent

That's SO not true. Spanner, Amazon's S3 are some of the biggest databases on the planet, and they are strongly consistent.

> Otherwise you've gotta wait for a sync between nodes before a response can be given, which would effectively force your cluster to have worse performance then running on single node again.

Yes, you are trading latency for fault-tolerance, but so what? What if the resulting latency is still more than good enough? There is no shortage of real large-scale applications where this is the case.

▲

walterbell

10 months ago

[-]

> Reproducible and bootstrappable systems get a lot of love among neurodivergent people. For good reason: they’re very friendly to those with little working memory but vast amounts of working context.

This is novel yet obvious in retrospect, has it been articulated or surveyed elsewhere?

▲

nyanpasu64

10 months ago

[-]

What's the distinction between working memory and working context?

▲

glic3rinu

10 months ago

[-]

I was wondering about that too. Compared to my peers I consider myself having a below average short term memory (not really good at raw computation power), but in contrast, I am pretty good at abstract thinking, e.g. I can easily visualize the end2end flow of a system, identify all the pain points and produce very good designs (in terms of simplicity). So perhaps this "working context" is about knowledge, pattern recognition, analytical thinking, ...

▲

kukkeliskuu

10 months ago

[-]

Typically there are no guarantees that a distributed system with ”eventual consistency” is is in consistent state as a whole at any point in time. Downstream systems need to be built based on the assumption that the system is perpetually inconsistent.

▲

GeneralMayhem

10 months ago

[-]

That's true in the general case, but there are usually ways to make it not matter.

If you have a vector clock with snapshot reads, then you may not know consistently what the system looks like now, but you can know what it looked like as of some arbitrary previous point in time. If your use case doesn't require read-modify-writes (e.g., metric calculations where the source of truth is out of your control and you just need to aggregate deltas), then that's good enough - you can find a set of results that are in sync with each other, even if they're not fully up to date.

If you have limited read-modify-writes, but you can at least engineer synchronous transactions on subsets of the data (e.g., single-row-level transactions) with asynchronous replication and fan-out of side-effects, then you can turn more complex problems into something that looks like the first case.

Those tricks alone are enough to remove most of the weirdness with eventual-consistency, but the broader point is that distributed system design is more about learning a toolbox of tricks to handle and describe certain patterns, of which you may have any number in any combination in any given project, than it is about "solving" the problem once and for all.

▲

kukkeliskuu

10 months ago

[-]

Sounds plausible. I am fighting a different battle, however.

For my corporate customers, what you are describing is way too sophisticated. They do what others do. As Kafka is de facto messaging platform, everybody uses it.

For example, if source data is multiplexed via Kafka into one sink, people typically say that the sink is "eventually consistent". Another case is when an operation writes into multiple microservices without transactions. People are not even consistent in their speech, let alone how they handle their data.

It seems to me that in 99% of such corporate systems would work better if people would optimize integrity (i.e. communicate with real transactions on a real database) instead of optimizing for performance and scalability.

▲

tazu

10 months ago

[-]

I'm having a hard time understanding what this is about. Is it about Kubernetes tradeoffs?

▲

NAHWheatCracker

10 months ago

[-]

It's just about tradeoffs in system design, nothing as specific as K8s.

I think the author just wanted to write a glib little blog post.

▲

ajb

10 months ago

[-]

It's about the tradeoffs between guarantees in distributed systems generally.

But I don't really think the allusive writing style is a good fit for tech. Actually it's pretty annoying even with non tech. YMMV. Here, it's hard to understand the post without already knowing most of what it's saying.

Of course, it's a blog post, the author doesn't owe anyone anything.

▲

User23

10 months ago

[-]

Distributed systems, like interrupts, are something you can’t bullshit and get right. The author is right though that nondeterminism makes systems easier to reason about if you have the right cognitive tools (ie, math).

Here[1] is an example of what rigorous reasoning about distributed systems looks like.

  the non-deterministic algorithm emerges when, abstracting from their mutual differences, we concentrate on what the many algorithms of the class have in common.

[1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD687...

▲

throw1234xxdgv

10 months ago

[-]

I agree completely and would even say this is the case with everything concurrent (interrupts, distributed systems, IPC/threads etc.)

▲

jiveturkey

10 months ago

[-]

Is this a lay description of CAP theorem? Or is it actually broader than that? I'm having a hard time getting my head around it.

▲

Jenk

10 months ago

[-]

I think it is CAP theorem and perhaps the author isn't familiar with the name?

▲

mlhpdx

10 months ago

[-]

That’s really the secret, isn’t it? Deciding, deliberately, what to leave out of a system so that it is possible for the people involved to maintain it. That is perhaps a good definition of software architecture.

▲

ta_1138

10 months ago

[-]

And yet, many an architecture review is all about a group of people trying to justify their existence by adding features. I was in a design review this week where 4 different architects were involved, along with the expected assortment of managers and product people. Every single suggestion was an addition. Having written systems like the one in question for decades, I considered the initial proposal as massively overengineered in the first place: An excuse to have a project that lets someone get upleveled in the next review cycle.