One of the technical questions was "if you have a db and a message queue, how do you get your update to alter both or neither (i.e. transactionally)"?
I thought about it for a couple of minutes, then came back with something like "I can't, and you can't either." Then I proposed the usual spiel about using a replicated-state-machine/write-ahead-log/event-sourcing (whatever it might be called at the time) and leaning into eventual consistency as the only practical solution.
He asked if I'd heard about the outbox pattern, so I let him describe it. Sure enough it sounded like this article. The secret to transacting across the database D and the message queue Q:
(D,Q)
is to split D into two parts (the State and the Outbox), transact across those instead (S,O) Q
and then just pretend that you have a transaction across D and Q.If you can't de-duplicate messages it's not possible, that's true.
The motive seems to be a naive process that enqueues a message and then commits to a database - two independent actions. But a well-behaved process would commit to a database, and then only if successful enqueue a message. That's better but still not atomic - commit, crash, and no message queued.
So the solution is a two-table write - the outbox pattern. But the process that reads the outbox must commit both a query and delete before sending the message. That's the same risk as the agreement well-behaved program - commit, crash, and no message queued. Except now you introduced another pipeline element so your overall complexity increases, and so too risk.
What if you never delete messages from the outbox? Well, what you have now is no longer an outbox nor a database nor useful for large volumes. What if you implement a database to track procesed messages. Return to square one - that's the same problem you were initially trying to solve.
What if you fetch, enqueue, and then delete? Ohh... that works. In case of a crash the message remains in the outbox. It may be processed in duplicate, but eventually if successfully it will be deleted from the outbox.
The message broker then receives a possibly duplicate message. It must consult its internal database, and if the message is unique, route it. So right back at square one. Can't have atomicity and uniqueness.
FWIW The article literally talks about the challenges with getting this to actually work and recommends removing it and just using the DB for everything.
From the end of the article:
The enqueue_workflow UDF creates this row in the same transaction as the user database update, guaranteeing atomicityIf the 2nd system write can fail for non-transient reasons, the outbox pattern doesn’t work and you need either 2 phase commit or a distributed saga.
I wrote about this here a few years ago.
Distributed coherency is not something you can abstract away, the abstractions all leak.
But as someone who builds distributed systems, I can tell you that transactions should be local. Anytime you want to lock something across the network (eg Canisters in ICP) so you can read it, that’s probably a code smell. You probably want to have evented reactive things ripple out, you do need idempotency, but you shouldn’t design your system to read remote state if you can help it. Only subscribe to remote messages.
It’s simple and easy to follow. At scale use multi tenancy.
However the way Postgres keeps around obsolete rows (deleted or modified) until they're vacuumed can cause problems for high throughput queues. So for those systems the complexity might be worth it. But I bet 90% of the time the choice to use a separate queue is premature optimization. And hopefully OrioleDB (undo based storage engine for postgres) will avoid most of these pitfalls reducing the need for separate queues even further.
Step 2: propose a source of truth that everyone can listen to. Hearing the same facts in the same order should put everyone in the same state (eventual consistency)
Step 3 (you are here): try to do better than EC, by merging the external queue into one of the nodes, making it the master.
Step 4: Now there's no distance between the nodes, so no need to solve the distributed systems problem and you can retire the queue.
I have rolled my own little durable workflows in Postgres before, in fact before I even knew durable workflows were a thing with solutions like Temporal. That's fine for many cases where you aren't doing enough steps for it to be tedious, and/or you want permanent records. Would do it again, but not for atomicity reasons.
Other comments have already discussed the issue with the outbox UDF, your external system has to poll and retry either way. It works though. Maybe I'm misunderstanding this?
Is it really a distributed system or just a bunch of services with a central database?
*: edit, maybe a better example here is a rail system with a single central dispatcher is centralized but may still be distributed
There are always tradeoffs of course, but building a truly decentralized system requires some really difficult compromises to correctness. The two general's problem is a great piece of reading on this topic - distribution always requires compromises in general, but to fully remove an authority on truth gets quite tricky.
A similar pattern has spilled out of projects like Warpstream[2], which I suspect is using Postgres behind the scenes of their control plane.
It is!
And the solution is to add an extra general on the left side. Let's call him Outus Boxus. The two generals on the left side can communicate in perfect lockstep. Then if you need the general on the right to find out about something, you can send a few workers to tell him or something...
More seriously though, you can have a DS for two reasons: tech or political.
Tech means scaling or reliability. So clients can be serviced by any of the nodes.
Political means different actors don't have a central authority. You can't stick two banks into one db.
This technique doesn't seem to address either aspect.
Neal Ford calls this a distributed monolith because a change to a database schema can break every single service at once, but there are very valid uses of this method.
There are decades of books on the foot guns as we used this even back in the client-server days.
One suggestion I have is to research where the first version of SoA failed, especially as these systems tend to erode into Enterprise Service Busses.
Products like Apache airflow tend to have value not because of the persistence layer, but because they force workflows into DAGs, which is an enforceable structural constraint, while SQL, being declarative, can sometimes force you into trying to enforce governance through observing behavior.
The former is not subject to Rice’s theorem, while the latter is.
If you actively control for these it will greatly increase the lifetime of this system before (or if) you reach the point you have to replace the system.
I've asked myself this question every single time I've had to use Zookeeper.
Apache Kafka being the poster child of the problem, with HBase in a close second.
In most services, I often swap out the message broker or the workflow engine, but the database almost always stays the same.
I'm not sure if I've understood this correctly.
There will always be a window for potential loss due to solar flares/whatever but the key in designing a system like this is to make sure you're aware of how the system can fail, accept that outcome and then work to, as much as possible, shrink the distance in cycles/logic between each persistence committal. Logic should be front-loaded to do as much prep work as possible before any irreversible actions happen and then those irreversible actions should be ordered to your preference and dispatched as quickly and cheaply as possible in a safe manner.
Here's another blog post about how a Postgres-backed task queue can run at scale: https://www.dbos.dev/blog/making-postgres-queues-scale
When workers query the db for jobs the rows get locked by the select and there are no race conditions or duplicate assigned jobs
It seems this article is trending toward that view: If you can maintain transactional consistency along with application workflow state, then would this generalize to maintaining distributed application state in general?
The follow-up would be: Would this be preferable to Valkey/Redis?
As to which technical solution would be optimal there are a bunch of factors to consider and I think preferences around features could lead you to a variety of options. Postgres is excellent as long as you're minimizing the amount of data piping directly through it or operating at a reasonable scale.
Yes, in the sense of 'too good to be true'
This sounds a lot like reinventing a message queue. Someone trying this in the future might learn painful lessons about ordering, commits, partitioning, dead-letter-queues, replayability, don't-call-me-I'll-call-you, and anything else a Kafka-like comes with out of the box.
And doesn’t that mean the job potentially runs twice? Yes.
In DBOS there are two kinds of “things that run”: workflows, and steps (workflows are made of steps).
Workflows must be deterministic (so it’s fine if it runs twice). Steps don’t have to be deterministic but have at-least-once execution (so it’s best if these are idempotent).