If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.
By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).
that puts you into tricky race condition territory, the data targeted by an event might have changed (or be deleted) between the time it was emitted and the time you're processing it. It's not always a problem, but you have to analyse if it could be every time.
It also means that you're losing information on what this event actually represents: looking at an old event you wouldn't know what it actually did, as the data has changed since then.
It also introduces a synchronous dependency between services: your consumer has to query the service that dispatched the event for additional information (which is complexity and extra load).
Ideally you'd design your event so that downstream consumers don't need extra information, or at least the information they need is independent from the data described by the event: eg a consumer needs the user name to format an email in reaction to the "user_changed_password" event? No problem to query the service for the name, these are independent concepts, updates to these things (password & name) can happen concurrently but it doesn't really matter if a race condition happens
1. take money from account A
2. if failed, put money back into account A
3. put money into account B
4. if failed, put money back into account A
In other words, perform compensating actions instead of doing transactions.
This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely. I've been working on this for a couple of years now and getting close to something really interesting ... but not quite there yet.
Like a distributed transaction or lock. This is the entire problem space, your example above is very naive.
"Accountants don't use erasers"
The ledger is the source of truth in accounting, if you use event streams as the source of truth you can gain the same advantages and disadvantages.
An event being past tense ONLY is a very critical part.
There are lots of way to address this all with their own tradeoffs and it is a case of the least worst option for a particuar context.
but over-reliance on ACID DBMSs and the false claim that ATMs use DTC really does hamper the ability to teach these concepts.
E.g. you perform step 2, but fail to record it. When resuming from crash, you perform step 2 again. Now A has too much money in their account.
Sagas require careful consideration to make sure you can provide one of these guarantees during a "commit" (the order in which you ACK a message, send resulting messages, and record your own state -- if necessary) as these operations are non-atomic. If you mess it up, you can end up providing the wrong level of guarantee by accident. For example:
1. fire resulting messages
2. store state (which includes ids of processed messages for idempotency)
3. ACK original event
In this case, you guarantee that you will always send results at-least-once if a crash happens between 1&2. Once we get past 2, we provide exactly-once semantics, but we can only guarantee at-least-once. If we change the order to:
1. store state
2. fire messages
3. ACK original message
We now only provide at-most-once semantics. In this case, if we crash between 1&2, when we resume, we will see that we've already handled the current message and not process it again, despite never having sent any result yet. We end up with at-most-once if we swap 1&3 as well.
So, yes, Sagas are great, but still pretty easy to mess up.
1. Debit A, Credit in-flight.
2. Credit B, Debit in-flight.
If 1. fails, nothing happened and everything is consistent.
If 2. fails, you know (because you have money left on in-flight), and you can retry later, or refund A.
This way at no point your total balance decreases, so everything is always consistent.
This isn't an easy-to-solve problem when it comes to distributed computing.
The important thing is not having money go missing.
Which distributed transaction scenario have you ever dealt with that wasn't correctly handled by a two-phase commit or at worst a three-phase commit?
That's kind of the first rule of any event-based system. It doesn't really matter the architecture, if you decide to name the things "event", everybody's head will break if you make them mutable.
If you decide to add mutation there in some way, you will need to rewrite the event stream, replacing entire events.
If the consumer goes to your database and asks "what's the data for customer 123 at event F52A?" it better always get back the same data or "that event doesn't exist, everything you know is wrong".
Sure, if the database supports this sort of temporal query, then you're good with such id-only events. But that's not exactly the default for most databases / data models.
Events are part of a stream that define your data. The stream doesn't have to be complete, but if it doesn't make sense to do things like buffer or edit it, it's probably something else and using that name will mislead people.
So the entity was updated. What's the problem?
Surely, some data needs to change if a password is updated?
Then I would argue it isn't a meaningful event. If some attributes of the event could become "out of date" such that the logical event risks invalidation in the future, you have probably put too much data into the event.
For example, including a user's preferences (e.g., display name) in a logon event - while convenient - means that if those preferences ever change, the event is invalid to reference for those facts. If you only include the user's id, your event should be valid forever (for most rational systems).
> your consumer has to query the service that dispatched the event
An unfortunate but necessary consequence of integrating multiple systems together. You can't take out a global database lock for every event emitted.
Also, CAP is a thing too.
Sure, try to keep transactions single-node. If you can't let me give you the advice of people FAR smarter than I:
- DO NOT DESIGN YOUR OWN DISTRIBUTED TRANSACTION SERVICE
Use a vetted one.
> The triggering of the action is a direct consequence of the information an event contains. Whether or not an action is triggered should not be the responsibility of the event.
I agree, but still for different consumers events will have different consequences - in some consumers it'll trigger an action that is part of a higher-level process (and possibly further events), in others it'll only lead to data being updated.
> If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.
With this I don't agree. I think that's the core of event-driven architecture that events drive the process, i.e. will trigger certain actions. That's not contradicting them describing what has occurred, and doesn't make them commands.
> By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).
This is feedback I got multiple times, the "notification plus callback" seems to be a popular pattern. It has its own problems though, both conceptual (event representing an immutable set of facts) and technical (high volume of events). I think digging into the pros and cons of that pattern will be one of my next blog posts! Stay tuned!
In an event-driven system, there is neither guarantee not expectation that an event will trigger an action; it might, but it might not. Events are simply a log [0] of "things" happening in various subsystems, published to various channels for other subsystems to ignore or act upon on their own terms.
Let say that we have two subsystems - A and B. When something happens on A, it will emit a corresponding event (e.g. SomethingHappened) to a specific channel (e.g. EventsFromA); if B is listening to that channel, it can "recognise" that event and initiate (i.e. "trigger") some action of its own.
However, if A explicitly wants B to do something, it's a command, i.e. a direct coupling by definition. As GP states, that is better handled as a direct request from A to B.
Theoretically, there is a possible scenario where A "knows" that a certain action needs to happen in the system, but does not know which subsystem has that capability, i.e. has no knowledge that B can do that. In that case it can "request" something to happen, e.g. by submitting an event like "UserCreationRequested"; however, there is no guarantee that any service will "see" that event and act upon it.
[0] https://engineering.linkedin.com/distributed-systems/log-wha...
SeatTimeLimitedReserved {41A, 15m}
SeatAssignedTo {UserA}
SeatBooked {41A}
If a consumer needs more data, there should be a new event.
Message queues aren't a networking protocol. Anyone can subscribe to consume the events.
Both events would "describe a space of things that occurred" as @bob1029 suggests.
The seat selection process for an actual airline probably needs to be more involved. @withinboredom recommends:
- SeatTimeLimitedReserved {41A, 15m}
- SeatAssignedTo {UserA}
- SeatBooked {41A}
In which case, only SeatBooked would trigger a BookingUpdated event.Most issues I've seen with events are caused by giving events imprecise names, names that mean more or less than what the events attest to.
For example, a UI should not emit a SetCreditLimitToAGazillion event because of a user interaction. Downstream programmers are likely to get confused and think that the state of the user's credit limit has been set to a gazillion, or needs to be set to a gazillion. Instead, the event should be UserRequestedCreditLimitSetToAGazillion. That accurately describes what the UI observed and is attesting to, and it is more likely to be interpreted correctly by downstream systems.
In the article's example, SeatSelected sound ambiguous to me. Does it only mean the user saw that the seat was available and attempted to reserve it? Or does it mean that the system has successfully reserved the seat for that passenger? Is the update finalized, or is the user partway through a multistep process that they might cancel before confirming? Depending on the answer, we might need to release the user's prior seat for other passengers, or we might need to reserve both seats for a few minutes, pending a confirmation of the change or a timeout of their hold on the new seat. The state of the reservation may or may not need to be updated. (There's nothing wrong with using a name like that in a toy example like the article does, but I want to make the point that event names in real systems need to be much more precise.)
Naming events accurately is the best protection against a downstream programmer misinterpreting them. But you still need to design the system and the events to make sure they can be used as intended, both for triggering behavior and for reporting the state and history of the system. You don't get anything automatically. You can't design a set of events for triggering behavior and expect that you'll be able to tell the state of the system from them, or vice-versa.
But bad names manifest as a multitude of problems much later on.
I wonder if this is an area LLMs can help us with because really a lot of us do struggle with it. I'm going to investigate!
We do a lot of event driven architecture with ActiveMQ. We try to stick messaging-as-signalling rather than messaing-as-data-transfer. These are the terms we came up with, I'm sure Martin Fowler or someone else has described it better!
So we have SystemA that is completing some processing of something. It's going to toss a message onto a queue that SystemB is listening on. We use an XA to make sure the database and broker commit together1. SystemB then receives the event from the queue and can begin it's little tiny bit of business processing.
If one divides their "things" up into logical business units of "things that must all happen, or none happen" you end up with a pretty minimalistic architecture thats easy to understand but also offers retry capabilities if a particular system errors out on a single message.
It also allows you to take SystemB offline and let it's work pile up, then resume it later. Or you can kick of arbitrary events to test parts of the system.
1: although if this didn't happen, say during a database failure at just the right time, the right usage of row locking, transactions, and indexes on the database prevent duplicates. This is so rare in practice but we protect against it anyway.
I was thinking of trigger-messages vs data-messages.
Then again we've just begun dabbling with AMQP as part of transitioning our legacy application to a new platform, so a n00b in the field.
We do have what you might consider hybrid messages, where we store incoming data in an object store and send a trigger-message with the key to process the data. This keeps the queue lean and makes it easy for support to inspect submitted data after it's been processed, to determine if we have a bug or it's garbage in garbage out.
I've never used EDA, just read about it, so I'm curious what you disagree with from the article.
It seems that the logic is reasonable, that subscribers have varying needs and publishers would need to account for those needs over time as the functionality (and data) required by subscribers evolves.
https://learn.microsoft.com/en-gb/archive/blogs/nickmalik/ki...
1. You should be able to recreate the complete business state from the complete event sequence only. No reaching out to other servers/servies/DB’s to get data.
2. The events should be as small as possible. So only carry the minimum data needed to implement rule #1
That’s it. It works really well in practice.
If you're a using a message queue, the message should convey necessary information such that, if all messages were replayed, the consumer would reach the same state. Anything other than that, you'll be in a world of pain at scale.
Events and messages are entirely different things. They might look similar, but their responsibilities are completely different. The scenario you're describing matches the usecases for messages, not events.
Being notified only _when_ something happened isn't always useful the world is changing underneath you (it _can be_ useful in particular situations, when you know state is final, but not as a general architecture principle).
Not really. Messages vs events is a foundational design trait whose discussion involves topics such as adopting distinct messaging patterns such as message queues or pub-sub. They have completely different responsibilities and solve completely different problems.
Is it a common occurence' and if it happens is it hard to debug/fix? Does Kafka and other popular event systems have something to defend against it?
It always bothers me that the systems I've worked on have their data flows mapped out basically in semi-up-to-date miro diagrams. If that. There's no overarching machine readable and verifiable spec.
Regarding if it's a problem or a regular occurrence: No, really not. I have never seen this being a problem, I think that fear is unfounded.
Like, the problem sounds bad enough to warrant it. If not, now do you choose when to apply it?
Our architects have a habit of ignoring these kind of issues and when you suggest making things like this a requirement they accuse you of excessive concern!
Geometrically speaking.
So, what should be in an event ? To me, it's the minimum but sufficient data on its own to be understandable.
[1] https://www.decodable.co/blog/taxonomy-of-data-change-events
Your categorization makes total sense and fits well with what I called the "spectrum". I only mentioned the "id-only" events to show what the one end of the spectrum would look like. What I call the "trigger" events would be what you call "delta" events. I should have written that more clearly.
Interestingly a few people advocated for id-only events as a response to the article. I have some issues with that pattern.. already thinking about a follow-up article to elaborate on that.
Yet I am troubled by it, and must disagree with some of the premises and conclusions. Only you know your specific constraints, so my 'armchair architecting' may be way off target. If so, I apologize, but I was particularly disturbed by this statement:
"events that travel between services have a dual role: They trigger actions and carry data."
Yes, sort of, but mostly no. Events do not "trigger" anything. The recipient of an event may perform an action in response to the event, but events cannot know how they will be used or by whom. Every message carries data, but a domain event is specifically constrained to be an immutable record of the fact that something of interest in the domain has happened.
The notion of modeling 'wide' vs 'short' events seems to ignore the domain while conflating very different kinds of messages - data/documents/blobs, implementation-level/internal events, domain events, and commands.
Modeling decisions should be based on the domain, not wide vs short. Domain events should have names that are meaningful in the problem domain, and they should not contain extraneous data nor references to implementation details/concepts. This leads to a few suggestions:
* Avoid Create/Read/Update/Delete (CRUD) event names as these are generic implementation-level events, not domain events. Emit such events "under the hood" for replication/notification if you must, but keep them out of the domain model.
* Name the domain event specifically; CRUD events are generally undesirable because they (a) are an implementation detail and (b) are not specific enough to understand without more information. Beware letting an implementation decision or limitation corrupt the domain model. In this example, the BookingUpdated event adds no value/information, makes filtering for the specific event types more complex, and pollutes the domain language with an unnecessary and potentially fragile implementation detail (the name of the db table Booking, which could just as easily have been Reservation or Order etc). SeatSelected is a great domain event name for a booking/reservations system. BookingSeatSelected if there is further scope beyond Booking that might have similar event names. BookingUpdated is an implementation-level, internal event, not part of the problem domain.
* What data is necessary to accurately record this domain event? A certain minimal set of relevant data items will be required to capture the event in context. Trying to anticipate and shortcut the needs of other/future services by adding extraneous data is risky. Including a full snapshot of the object even more risky, as this makes all consumers dependent on the entire object schema.
* The notion of "table-stream duality" as presented is likewise troublesome, as that is an implementation design choice, not part of the domain model. I don't think that it is a goal worthy of breaking your domain model, and suggest that it should not be considered at all in the domain model's design. Doing so is a form of premature optimization :)
* That said, separating entity and event streams would keep table-stream duality but require more small tables, i.e. one domain event type per stream and another Booking stream to hold entity state as necessary. A Booking service can subscribe to SeatSelected et al events (presumably from the UI's back-end service) and maintain a separate table for booking-object versions/state. A SeatReserved event can be emitted by the Booking service, and no one has to know about the BookingUpdated event but replication hosts.
Thanks again for writing and posting this, it really made me think. Good luck with your project!
> Yes, sort of, but mostly no. Events do not "trigger" anything. The recipient of an event may perform an action in response to the event, but events cannot know how they will be used or by whom.
I don't see the difference. Maybe it's a language thing. But I'd say if a recipient receives an event and perfoms an action as consequence, it's fair to say the event triggered the action. The fact that the event triggers something doesn't mean the event or the publisher must know at runtime what's being triggered.
Regarding your suggestions, I think your proving my point. Of course the whole "there are two types of.." is a generalization, but given that, you seem to fall in the first category, the one I called "DDD engineer/architect".
My response to the first three would be: Why? I know some literature suggests this. I've applied this pattern in the past. And I wrote "This is totally legitimate and will work.". But we also need to ask ourselves: What's the actual value? Why does the kind of event / the business reason have to be encoded as the name/type of the event? Honest question. Doesn't having it in the event payload carry the same information, just in a different place?
I don't want to be following what might be seen as "best practices" just for the sake of it, without understanding why.
I know of a few systems that started of with domain events that were named & typed "properly" according to the business event. And after a while, the need for wide events carrying the full state of the source entity arose. If you look at talks and articles from other EDA practioners (e.g. the ones on https://github.com/lutzh/awesome-event-driven-architecture#r...), you'll see that's not uncommon. This regularly leads to having to provide the wide events in addition to the "short" events. This is extra effort and has its own drawbacks. I just want to save the readers the extra work.
You're welcome, sorry for the delayed response.
>>...Events do not "trigger" ... >I don't see the difference
Yes, sorry, I'm being pedantic without context - to the uninitiated, the expression "events trigger actions" may be confusing, as it implies that events are active/actors/participants with 1:1 correspondence with reactions, omitting the recipient's agency.
>"two types of..."
...meh; I am/was both, and many other roles, but you are correct in that I hold the problem/solution domain as primary, and prefer to keep the implementation domain out of it as much as possible.
>Doesn't having it in the event payload carry the same information, just in a different place?
Yes, grouping and filtering is absolutely 100% functionally equivalent. But it is not free.
>Why?
Thanks for asking!
BookingUpdated(Reason) looks to me like an unnecessary coupling/corruption of the implementation model and domain model. This may cause additional cognitive load (user confusion/search/explanation) and possibly impact the event-routing mechanisms significantly.
For example:
* a consumer desiring only SeatReserved events will not find that as a topic. Instead, they will have to (unnecessarily) learn something about the implementation model (BookingUpdated:Reason==SeatReserved) in order to find what they want.
Slightly annoying, maybe no big deal w/better topic search or docs, just one example of a tiny unintended consequence.
* where is the selection/filtering performed? broker or consumer? Filtering is probably not free for a single-topic-per-stream implementation; something pays the price for it.
Possibly also no big deal under ordinary circumstances... but here's one way things might go wrong:
* SeatReserved events likely happen more often than the other types due to timeouts, conflicts, and retries. Ordinarily not a problem, but when hot tickets first go on sale the flood of traffic from people and bots competing for the best seats may cause SeatReserved events to increase far out of proportion with the others.
But hey, that's what autoscaling cloud services as for, right?. If the broker handles filtering, that cloud bill might be a bit scary. If consumers handle filtering, every service consuming any type of BookingUpdated event will also have to scale up too, and that bill might be terrifying. :)
With independent topics/streams/tables for each discrete concept, SeatReserved can scale independently, its traffic cannot directly affect services that do not care about it, and the names of topics and events directly reflect the problem/solution domain.
>EDA resourse repo
Excellent collection, thanks for sharing it!
While the need for "wide" events can be symptoms of other design issues, decorating base events is a good solution when they are necessary. If you really need an aggregated BillingUpdated(Reason) event, for example, you can generate it downstream and preserve the independence of the individual event types.
Offhand, I can only think of one situation where capturing the full state of the source entity in an event would be necessary: when it's ephemeral - but that sounds like a larger discussion (perhaps after enjoying a few videos from your collection).
Thanks again!