The Dual Nature of Events in Event-Driven Architecture
112 points
2 months ago
| 11 comments
| reactivesystems.eu
| HN
bob1029
2 months ago
[-]
The triggering of the action is a direct consequence of the information an event contains. Whether or not an action is triggered should not be the responsibility of the event.

If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.

By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).

reply
williamdclt
2 months ago
[-]
> each consumer can perform their own targeted lookups as needed

that puts you into tricky race condition territory, the data targeted by an event might have changed (or be deleted) between the time it was emitted and the time you're processing it. It's not always a problem, but you have to analyse if it could be every time.

It also means that you're losing information on what this event actually represents: looking at an old event you wouldn't know what it actually did, as the data has changed since then.

It also introduces a synchronous dependency between services: your consumer has to query the service that dispatched the event for additional information (which is complexity and extra load).

Ideally you'd design your event so that downstream consumers don't need extra information, or at least the information they need is independent from the data described by the event: eg a consumer needs the user name to format an email in reaction to the "user_changed_password" event? No problem to query the service for the name, these are independent concepts, updates to these things (password & name) can happen concurrently but it doesn't really matter if a race condition happens

reply
candiddevmike
2 months ago
[-]
There should be some law that says strictly serialized process should never be broken into discreet services. Distributed locks and transactions are hell.
reply
withinboredom
2 months ago
[-]
The best way to avoid distributed locks and transactions is to manually do the work. For example, instead of doing a distributed lock on two accounts when transferring funds, you might do this (which is the same as a distributed transaction, without the lock):

1. take money from account A

2. if failed, put money back into account A

3. put money into account B

4. if failed, put money back into account A

In other words, perform compensating actions instead of doing transactions.

This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely. I've been working on this for a couple of years now and getting close to something really interesting ... but not quite there yet.

reply
candiddevmike
2 months ago
[-]
> This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely

Like a distributed transaction or lock. This is the entire problem space, your example above is very naive.

reply
withinboredom
2 months ago
[-]
You _can_ use them, but even that won't save you. If you take a distributed lock, and crash, you still have the same issue. What I wrote is essentially a distributed transaction and what happens in a distributed transaction with "read uncommitted" isolation levels. A database that supports this handles all the potential failure cases for you. However, that doesn't magically make the errors disappear or may not be even fully/properly handled (e.g., a node running out of disk space in the middle of the transaction) by your code/database. It isn't naive, it is literally a pseudo-code of what you are proposing.
reply
nyrikki
2 months ago
[-]
It is not a DTC, despite the DB world using ATM's wrongly as an example for decades, follow their model for actually moving money and you would be sent to jail.

"Accountants don't use erasers"

The ledger is the source of truth in accounting, if you use event streams as the source of truth you can gain the same advantages and disadvantages.

An event being past tense ONLY is a very critical part.

There are lots of way to address this all with their own tradeoffs and it is a case of the least worst option for a particuar context.

but over-reliance on ACID DBMSs and the false claim that ATMs use DTC really does hamper the ability to teach these concepts.

reply
singron
2 months ago
[-]
The better version of this is sagas, which is a kind of a simplified distributed transaction. If you do this without actually using sagas, you can really mess this up.

E.g. you perform step 2, but fail to record it. When resuming from crash, you perform step 2 again. Now A has too much money in their account.

reply
withinboredom
2 months ago
[-]
Sagas are great for this and should be used when able, IMHO. It's still possible to mess it up, as there are basically two guarantees you can make in a distributed system: at-least-once, and at-most-once. Thus, you will either need to accept the possibility of lost messages or be able to make your event consumers idempotent to provide an illusion of exactly-once.

Sagas require careful consideration to make sure you can provide one of these guarantees during a "commit" (the order in which you ACK a message, send resulting messages, and record your own state -- if necessary) as these operations are non-atomic. If you mess it up, you can end up providing the wrong level of guarantee by accident. For example:

1. fire resulting messages

2. store state (which includes ids of processed messages for idempotency)

3. ACK original event

In this case, you guarantee that you will always send results at-least-once if a crash happens between 1&2. Once we get past 2, we provide exactly-once semantics, but we can only guarantee at-least-once. If we change the order to:

1. store state

2. fire messages

3. ACK original message

We now only provide at-most-once semantics. In this case, if we crash between 1&2, when we resume, we will see that we've already handled the current message and not process it again, despite never having sent any result yet. We end up with at-most-once if we swap 1&3 as well.

So, yes, Sagas are great, but still pretty easy to mess up.

reply
hcarvalhoalves
2 months ago
[-]
Here's how you can do this. You have 3 accounts: A, B and in-flight.

1. Debit A, Credit in-flight.

2. Credit B, Debit in-flight.

If 1. fails, nothing happened and everything is consistent.

If 2. fails, you know (because you have money left on in-flight), and you can retry later, or refund A.

This way at no point your total balance decreases, so everything is always consistent.

reply
withinboredom
2 months ago
[-]
It can fail at your commas in 1&2, then you are just as broke as everyone else.

This isn't an easy-to-solve problem when it comes to distributed computing.

reply
hcarvalhoalves
1 month ago
[-]
It should be an atomic transaction, double-entry style, so it can’t fail between commas.

The important thing is not having money go missing.

reply
withinboredom
1 month ago
[-]
The only way that is possible is if the tables exist on the same database server. Otherwise, you are right back at distributed transaction problems.
reply
hcarvalhoalves
1 month ago
[-]
That's the whole point of double-entry. You don't split the entries.
reply
withinboredom
1 month ago
[-]
Then this is off-topic af. We are talking about distributed systems here.
reply
chipdart
2 months ago
[-]
> Distributed locks and transactions are hell.

Which distributed transaction scenario have you ever dealt with that wasn't correctly handled by a two-phase commit or at worst a three-phase commit?

reply
immibis
2 months ago
[-]
The scenario where one of the processes crashes cannot be handled by any number of commit phases.
reply
marcosdumay
2 months ago
[-]
Your event data must not be mutable.

That's kind of the first rule of any event-based system. It doesn't really matter the architecture, if you decide to name the things "event", everybody's head will break if you make them mutable.

If you decide to add mutation there in some way, you will need to rewrite the event stream, replacing entire events.

reply
gunnarmorling
2 months ago
[-]
It's not about mutability of events, but about mutating the underlying data itself. If the event only says "customer 123 has been updated", and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted. Depending on the use case, this may or may not be a problem. If the consumer is only interested in the current state of the data, this typically is acceptable, but if it is needed in the complete history of changes, it is not.
reply
marcosdumay
2 months ago
[-]
Making a wacky 2-steps announcement protocol doesn't change the nature of your events.

If the consumer goes to your database and asks "what's the data for customer 123 at event F52A?" it better always get back the same data or "that event doesn't exist, everything you know is wrong".

reply
gunnarmorling
2 months ago
[-]
> ... at event F52A

Sure, if the database supports this sort of temporal query, then you're good with such id-only events. But that's not exactly the default for most databases / data models.

reply
marcosdumay
2 months ago
[-]
I'm understanding what you have isn't really "events", but some kind of "notifications".

Events are part of a stream that define your data. The stream doesn't have to be complete, but if it doesn't make sense to do things like buffer or edit it, it's probably something else and using that name will mislead people.

reply
chipdart
2 months ago
[-]
> (...) and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted.

So the entity was updated. What's the problem?

reply
phi-go
2 months ago
[-]
I understand gp to say that the database data is changed not the data in the event.

Surely, some data needs to change if a password is updated?

reply
bob1029
2 months ago
[-]
> an event might have changed (or be deleted) between the time it was emitted

Then I would argue it isn't a meaningful event. If some attributes of the event could become "out of date" such that the logical event risks invalidation in the future, you have probably put too much data into the event.

For example, including a user's preferences (e.g., display name) in a logon event - while convenient - means that if those preferences ever change, the event is invalid to reference for those facts. If you only include the user's id, your event should be valid forever (for most rational systems).

> your consumer has to query the service that dispatched the event

An unfortunate but necessary consequence of integrating multiple systems together. You can't take out a global database lock for every event emitted.

reply
AtlasBarfed
2 months ago
[-]
https://medium.com/geekculture/the-eight-fallacies-of-distri...

Also, CAP is a thing too.

Sure, try to keep transactions single-node. If you can't let me give you the advice of people FAR smarter than I:

- DO NOT DESIGN YOUR OWN DISTRIBUTED TRANSACTION SERVICE

Use a vetted one.

reply
lutzh
2 months ago
[-]
Thanks for your feedback, I appreciate it!

> The triggering of the action is a direct consequence of the information an event contains. Whether or not an action is triggered should not be the responsibility of the event.

I agree, but still for different consumers events will have different consequences - in some consumers it'll trigger an action that is part of a higher-level process (and possibly further events), in others it'll only lead to data being updated.

> If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.

With this I don't agree. I think that's the core of event-driven architecture that events drive the process, i.e. will trigger certain actions. That's not contradicting them describing what has occurred, and doesn't make them commands.

> By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).

This is feedback I got multiple times, the "notification plus callback" seems to be a popular pattern. It has its own problems though, both conceptual (event representing an immutable set of facts) and technical (high volume of events). I think digging into the pros and cons of that pattern will be one of my next blog posts! Stay tuned!

reply
BerislavLopac
1 month ago
[-]
> I think that's the core of event-driven architecture that events drive the process, i.e. will trigger certain actions.

In an event-driven system, there is neither guarantee not expectation that an event will trigger an action; it might, but it might not. Events are simply a log [0] of "things" happening in various subsystems, published to various channels for other subsystems to ignore or act upon on their own terms.

Let say that we have two subsystems - A and B. When something happens on A, it will emit a corresponding event (e.g. SomethingHappened) to a specific channel (e.g. EventsFromA); if B is listening to that channel, it can "recognise" that event and initiate (i.e. "trigger") some action of its own.

However, if A explicitly wants B to do something, it's a command, i.e. a direct coupling by definition. As GP states, that is better handled as a direct request from A to B.

Theoretically, there is a possible scenario where A "knows" that a certain action needs to happen in the system, but does not know which subsystem has that capability, i.e. has no knowledge that B can do that. In that case it can "request" something to happen, e.g. by submitting an event like "UserCreationRequested"; however, there is no guarantee that any service will "see" that event and act upon it.

[0] https://engineering.linkedin.com/distributed-systems/log-wha...

reply
withinboredom
2 months ago
[-]
Events shouldn't carry any data in my opinion, except parameterized data. In the context of a booking, for example, it would be SeatBooked {41A} instead of 41ABooked, though the latter is a better event, but harder to program for. The entire flow might looked like this:

SeatTimeLimitedReserved {41A, 15m}

SeatAssignedTo {UserA}

SeatBooked {41A}

If a consumer needs more data, there should be a new event.

reply
monksy
2 months ago
[-]
Say that again louder for people in the back.

Message queues aren't a networking protocol. Anyone can subscribe to consume the events.

reply
jwarden
2 months ago
[-]
Another architecture might be that the service responsible for Seat Selection emits a `SeatSelected` event, and another service responsible for updating bookings emits a `BookingUpdated(Reason: SeatSelected)` "fat" event. Same for `PaymentReceived` and `TicketIssued`.

Both events would "describe a space of things that occurred" as @bob1029 suggests.

The seat selection process for an actual airline probably needs to be more involved. @withinboredom recommends:

  - SeatTimeLimitedReserved {41A, 15m}
  - SeatAssignedTo {UserA}
  - SeatBooked {41A}
In which case, only SeatBooked would trigger a BookingUpdated event.
reply
lutzh
2 months ago
[-]
Thanks for your feedback. I realize I should have elaborated the example a bit more, it's too vague. So, as I wrote in some other reply as well, please don't over-interpret it. The point was only to say that in order to differentiate the events, we don't necessarily need distinct types (which would result in multiple schemas on a topic), but can instead encode it in one type/schema. Like mapping in ORM - instead of "table per subclass", you can use "table per class hierarchy".
reply
dkarl
2 months ago
[-]
Events are published observations of facts. If you want to be able to use them as triggers, or to build state, then you have to choose the events and design the system to make that possible, but you always have to ensure that systems only publish facts that they have observed.

Most issues I've seen with events are caused by giving events imprecise names, names that mean more or less than what the events attest to.

For example, a UI should not emit a SetCreditLimitToAGazillion event because of a user interaction. Downstream programmers are likely to get confused and think that the state of the user's credit limit has been set to a gazillion, or needs to be set to a gazillion. Instead, the event should be UserRequestedCreditLimitSetToAGazillion. That accurately describes what the UI observed and is attesting to, and it is more likely to be interpreted correctly by downstream systems.

In the article's example, SeatSelected sound ambiguous to me. Does it only mean the user saw that the seat was available and attempted to reserve it? Or does it mean that the system has successfully reserved the seat for that passenger? Is the update finalized, or is the user partway through a multistep process that they might cancel before confirming? Depending on the answer, we might need to release the user's prior seat for other passengers, or we might need to reserve both seats for a few minutes, pending a confirmation of the change or a timeout of their hold on the new seat. The state of the reservation may or may not need to be updated. (There's nothing wrong with using a name like that in a toy example like the article does, but I want to make the point that event names in real systems need to be much more precise.)

Naming events accurately is the best protection against a downstream programmer misinterpreting them. But you still need to design the system and the events to make sure they can be used as intended, both for triggering behavior and for reporting the state and history of the system. You don't get anything automatically. You can't design a set of events for triggering behavior and expect that you'll be able to tell the state of the system from them, or vice-versa.

reply
lutzh
2 months ago
[-]
Thanks for your feedback! Very good point on the naming. fwiw the idea was if you buy a cinema ticket, you are usually presented with some sort of seating plan and select the seat (basically putting them into the shopping cart). So SeatSelected would be the equivalent of "ItemAdded" to the shopping cart in an e-commerce application I guess. Please don't over-interpret the example. There isn't even a definition what that booking aggregate contains. The point was really only to say that in order to differentiate the events, we don't necessarily need distinct types (which would result in multiple schemas on a topic), but can instead encode it in one type/schema. Think of it like mapping in ORM - instead of "table per subclass", you can use "table per class hierarchy".
reply
switch007
2 months ago
[-]
Such an important point about naming things accurately. It can be hard, so we often take the easy path

But bad names manifest as a multitude of problems much later on.

I wonder if this is an area LLMs can help us with because really a lot of us do struggle with it. I'm going to investigate!

reply
exabrial
2 months ago
[-]
Well put!

We do a lot of event driven architecture with ActiveMQ. We try to stick messaging-as-signalling rather than messaing-as-data-transfer. These are the terms we came up with, I'm sure Martin Fowler or someone else has described it better!

So we have SystemA that is completing some processing of something. It's going to toss a message onto a queue that SystemB is listening on. We use an XA to make sure the database and broker commit together1. SystemB then receives the event from the queue and can begin it's little tiny bit of business processing.

If one divides their "things" up into logical business units of "things that must all happen, or none happen" you end up with a pretty minimalistic architecture thats easy to understand but also offers retry capabilities if a particular system errors out on a single message.

It also allows you to take SystemB offline and let it's work pile up, then resume it later. Or you can kick of arbitrary events to test parts of the system.

1: although if this didn't happen, say during a database failure at just the right time, the right usage of row locking, transactions, and indexes on the database prevent duplicates. This is so rare in practice but we protect against it anyway.

reply
magicalhippo
2 months ago
[-]
> We try to stick messaging-as-signalling rather than messaing-as-data-transfer.

I was thinking of trigger-messages vs data-messages.

Then again we've just begun dabbling with AMQP as part of transitioning our legacy application to a new platform, so a n00b in the field.

We do have what you might consider hybrid messages, where we store incoming data in an object store and send a trigger-message with the key to process the data. This keeps the queue lean and makes it easy for support to inspect submitted data after it's been processed, to determine if we have a bug or it's garbage in garbage out.

reply
remram
2 months ago
[-]
So... you're not using event-driven architecture. That's fine, but I don't know why it supports this article's weird point about event-driven architecture.
reply
RaftPeople
2 months ago
[-]
> So... you're not using event-driven architecture. That's fine, but I don't know why it supports this article's weird point about event-driven architecture.

I've never used EDA, just read about it, so I'm curious what you disagree with from the article.

It seems that the logic is reasonable, that subscribers have varying needs and publishers would need to account for those needs over time as the functionality (and data) required by subscribers evolves.

reply
svilen_dobrev
2 months ago
[-]
reply
lutzh
2 months ago
[-]
Good catch! Indeed, without me realizing it, this trigger/data duality is pretty much what Event Message and Document Message are in "Enterprise Integration Patterns" (which is what the post you linked refers to). As it happens, in the book the authors also speak about "a combined document/event message", which is how me mostly use events in EDA today, I think.
reply
deterministic
2 months ago
[-]
I design events following these rules:

1. You should be able to recreate the complete business state from the complete event sequence only. No reaching out to other servers/servies/DB’s to get data.

2. The events should be as small as possible. So only carry the minimum data needed to implement rule #1

That’s it. It works really well in practice.

reply
hcarvalhoalves
2 months ago
[-]
The "produce a trigger event then have the consumer reach back to the producer and fetch data" can be an anti-pattern. You expose yourself to all kinds of race conditions and cache incoherency, then you start trying to fix with cache invalidation or pinning readers, and the result is you can't scale readers well.

If you're a using a message queue, the message should convey necessary information such that, if all messages were replayed, the consumer would reach the same state. Anything other than that, you'll be in a world of pain at scale.

reply
chipdart
2 months ago
[-]
> If you're a using a message queue (...)

Events and messages are entirely different things. They might look similar, but their responsibilities are completely different. The scenario you're describing matches the usecases for messages, not events.

reply
hcarvalhoalves
2 months ago
[-]
The situation of messages vs events is analogous to append-only database vs update-in-place database. You get exposed to the same issues at scale if you rely on the later.

Being notified only _when_ something happened isn't always useful the world is changing underneath you (it _can be_ useful in particular situations, when you know state is final, but not as a general architecture principle).

reply
chipdart
2 months ago
[-]
> The situation of messages vs events is analogous to append-only database vs update-in-place database. You get exposed to the same issues at scale if you rely on the later.

Not really. Messages vs events is a foundational design trait whose discussion involves topics such as adopting distinct messaging patterns such as message queues or pub-sub. They have completely different responsibilities and solve completely different problems.

reply
hcarvalhoalves
2 months ago
[-]
I don't see how using pub-sub or not changes how you model the data. It should be orthogonal. Do you have a good example?
reply
kaba0
2 months ago
[-]
I am going on a bit of a tangent here, but I always wondered, are those of you who use absolutely huge event-driven architectures, have you ever got yourself into a loop? I can't help but worry about such, as event systems are fundamentally Turing-complete, and with a complex enough system it doesn't seem too hard to accidentally send an event because A, which will eventually, multiple other events later again cases A.

Is it a common occurence' and if it happens is it hard to debug/fix? Does Kafka and other popular event systems have something to defend against it?

reply
Hilift
2 months ago
[-]
Debugging Windows events is challenging. I once created a reproducible deadlock using events. I had a Windows application with a notebook computer base, so I had to provide for and handle the power save/resume event for a service in the application. This was simple, just ensure that the service stopped and didn't get into a funky state, and restarted cleanly after resume. During testing, I accidentally discovered the power save event was going into a loop or hanging. Fortunately, Microsoft has a tool for this (DebugDiag - C#), and to my amazement landed right on the problem area. This event (like most) can and was firing multiple times..., so had to add some extra locks. Also, the power save/resume code worked 100% on a virtual, it was real hardware where the issue manifested.
reply
plaguuuuuu
2 months ago
[-]
Never seen it, but totally possible.

It always bothers me that the systems I've worked on have their data flows mapped out basically in semi-up-to-date miro diagrams. If that. There's no overarching machine readable and verifiable spec.

reply
lutzh
2 months ago
[-]
Reg. the "technical" question: Kafka or any log-based message broker (or any message queue) would not prevent you from that. Any service can publish/send and/or subscribe/receive.

Regarding if it's a problem or a regular occurrence: No, really not. I have never seen this being a problem, I think that fear is unfounded.

reply
hcarvalhoalves
2 months ago
[-]
Yes, it happens. A way to deal with it is carrying some counter on the message metadata and incrementing it every time a consumer passes it along, so you can detect recursions. Another is having messages carry a unique id, and consumers record already seen messages.
reply
switch007
2 months ago
[-]
Do you consider it a requirement for every message?

Like, the problem sounds bad enough to warrant it. If not, now do you choose when to apply it?

Our architects have a habit of ignoring these kind of issues and when you suggest making things like this a requirement they accuse you of excessive concern!

reply
hcarvalhoalves
2 months ago
[-]
I've worked with a codebase in the past that would handle this transparently with some sort of middleware for consumer/producer.
reply
revskill
2 months ago
[-]
Event is a point in time. State is a range in time.

Geometrically speaking.

So, what should be in an event ? To me, it's the minimum but sufficient data on its own to be understandable.

reply
lutzh
2 months ago
[-]
Article observing that events in event-driven architecture are both triggers of actions and carriers of data, and that these roles may conflict in the event design. Submitted by author.
reply
gunnarmorling
2 months ago
[-]
Nice one. I wrote about this a while ago, from a slightly different perspective, focusing on data change events [1]. Making a similar differentiation there between id-only events (which you describe as triggers of action; from a data change feed perspective, that action typically would be a re-select of the current state of the represented record), full events (your carriers of data) and patch events (carriers of data with only the subset of attributes whose value has changed).

[1] https://www.decodable.co/blog/taxonomy-of-data-change-events

reply
lutzh
2 months ago
[-]
Thanks for your feedback Gunnar, I appreciate it!

Your categorization makes total sense and fits well with what I called the "spectrum". I only mentioned the "id-only" events to show what the one end of the spectrum would look like. What I call the "trigger" events would be what you call "delta" events. I should have written that more clearly.

Interestingly a few people advocated for id-only events as a response to the article. I have some issues with that pattern.. already thinking about a follow-up article to elaborate on that.

reply
cushpush
2 months ago
[-]
I don't understand where this distinction is useful - signals are data, just a tiny bit, and data is a signal, just a lot of it. Are you talking about transmission-size? Or are we resigned to the fact that intra-computer communication is inefficient and it's enough to postulate about the sizes of bandages?
reply
stevenalowe
2 months ago
[-]
Thank you for this well-written, well-reasoned, and thoroughly enjoyable article!

Yet I am troubled by it, and must disagree with some of the premises and conclusions. Only you know your specific constraints, so my 'armchair architecting' may be way off target. If so, I apologize, but I was particularly disturbed by this statement:

"events that travel between services have a dual role: They trigger actions and carry data."

Yes, sort of, but mostly no. Events do not "trigger" anything. The recipient of an event may perform an action in response to the event, but events cannot know how they will be used or by whom. Every message carries data, but a domain event is specifically constrained to be an immutable record of the fact that something of interest in the domain has happened.

The notion of modeling 'wide' vs 'short' events seems to ignore the domain while conflating very different kinds of messages - data/documents/blobs, implementation-level/internal events, domain events, and commands.

Modeling decisions should be based on the domain, not wide vs short. Domain events should have names that are meaningful in the problem domain, and they should not contain extraneous data nor references to implementation details/concepts. This leads to a few suggestions:

* Avoid Create/Read/Update/Delete (CRUD) event names as these are generic implementation-level events, not domain events. Emit such events "under the hood" for replication/notification if you must, but keep them out of the domain model.

* Name the domain event specifically; CRUD events are generally undesirable because they (a) are an implementation detail and (b) are not specific enough to understand without more information. Beware letting an implementation decision or limitation corrupt the domain model. In this example, the BookingUpdated event adds no value/information, makes filtering for the specific event types more complex, and pollutes the domain language with an unnecessary and potentially fragile implementation detail (the name of the db table Booking, which could just as easily have been Reservation or Order etc). SeatSelected is a great domain event name for a booking/reservations system. BookingSeatSelected if there is further scope beyond Booking that might have similar event names. BookingUpdated is an implementation-level, internal event, not part of the problem domain.

* What data is necessary to accurately record this domain event? A certain minimal set of relevant data items will be required to capture the event in context. Trying to anticipate and shortcut the needs of other/future services by adding extraneous data is risky. Including a full snapshot of the object even more risky, as this makes all consumers dependent on the entire object schema.

* The notion of "table-stream duality" as presented is likewise troublesome, as that is an implementation design choice, not part of the domain model. I don't think that it is a goal worthy of breaking your domain model, and suggest that it should not be considered at all in the domain model's design. Doing so is a form of premature optimization :)

* That said, separating entity and event streams would keep table-stream duality but require more small tables, i.e. one domain event type per stream and another Booking stream to hold entity state as necessary. A Booking service can subscribe to SeatSelected et al events (presumably from the UI's back-end service) and maintain a separate table for booking-object versions/state. A SeatReserved event can be emitted by the Booking service, and no one has to know about the BookingUpdated event but replication hosts.

Thanks again for writing and posting this, it really made me think. Good luck with your project!

reply
lutzh
2 months ago
[-]
Thank you for your kind words!

> Yes, sort of, but mostly no. Events do not "trigger" anything. The recipient of an event may perform an action in response to the event, but events cannot know how they will be used or by whom.

I don't see the difference. Maybe it's a language thing. But I'd say if a recipient receives an event and perfoms an action as consequence, it's fair to say the event triggered the action. The fact that the event triggers something doesn't mean the event or the publisher must know at runtime what's being triggered.

Regarding your suggestions, I think your proving my point. Of course the whole "there are two types of.." is a generalization, but given that, you seem to fall in the first category, the one I called "DDD engineer/architect".

My response to the first three would be: Why? I know some literature suggests this. I've applied this pattern in the past. And I wrote "This is totally legitimate and will work.". But we also need to ask ourselves: What's the actual value? Why does the kind of event / the business reason have to be encoded as the name/type of the event? Honest question. Doesn't having it in the event payload carry the same information, just in a different place?

I don't want to be following what might be seen as "best practices" just for the sake of it, without understanding why.

I know of a few systems that started of with domain events that were named & typed "properly" according to the business event. And after a while, the need for wide events carrying the full state of the source entity arose. If you look at talks and articles from other EDA practioners (e.g. the ones on https://github.com/lutzh/awesome-event-driven-architecture#r...), you'll see that's not uncommon. This regularly leads to having to provide the wide events in addition to the "short" events. This is extra effort and has its own drawbacks. I just want to save the readers the extra work.

reply
stevenalowe
1 month ago
[-]
>Thank you for your kind words!

You're welcome, sorry for the delayed response.

>>...Events do not "trigger" ... >I don't see the difference

Yes, sorry, I'm being pedantic without context - to the uninitiated, the expression "events trigger actions" may be confusing, as it implies that events are active/actors/participants with 1:1 correspondence with reactions, omitting the recipient's agency.

>"two types of..."

...meh; I am/was both, and many other roles, but you are correct in that I hold the problem/solution domain as primary, and prefer to keep the implementation domain out of it as much as possible.

>Doesn't having it in the event payload carry the same information, just in a different place?

Yes, grouping and filtering is absolutely 100% functionally equivalent. But it is not free.

>Why?

Thanks for asking!

BookingUpdated(Reason) looks to me like an unnecessary coupling/corruption of the implementation model and domain model. This may cause additional cognitive load (user confusion/search/explanation) and possibly impact the event-routing mechanisms significantly.

For example:

* a consumer desiring only SeatReserved events will not find that as a topic. Instead, they will have to (unnecessarily) learn something about the implementation model (BookingUpdated:Reason==SeatReserved) in order to find what they want.

Slightly annoying, maybe no big deal w/better topic search or docs, just one example of a tiny unintended consequence.

* where is the selection/filtering performed? broker or consumer? Filtering is probably not free for a single-topic-per-stream implementation; something pays the price for it.

Possibly also no big deal under ordinary circumstances... but here's one way things might go wrong:

* SeatReserved events likely happen more often than the other types due to timeouts, conflicts, and retries. Ordinarily not a problem, but when hot tickets first go on sale the flood of traffic from people and bots competing for the best seats may cause SeatReserved events to increase far out of proportion with the others.

But hey, that's what autoscaling cloud services as for, right?. If the broker handles filtering, that cloud bill might be a bit scary. If consumers handle filtering, every service consuming any type of BookingUpdated event will also have to scale up too, and that bill might be terrifying. :)

With independent topics/streams/tables for each discrete concept, SeatReserved can scale independently, its traffic cannot directly affect services that do not care about it, and the names of topics and events directly reflect the problem/solution domain.

>EDA resourse repo

Excellent collection, thanks for sharing it!

While the need for "wide" events can be symptoms of other design issues, decorating base events is a good solution when they are necessary. If you really need an aggregated BillingUpdated(Reason) event, for example, you can generate it downstream and preserve the independence of the individual event types.

Offhand, I can only think of one situation where capturing the full state of the source entity in an event would be necessary: when it's ephemeral - but that sounds like a larger discussion (perhaps after enjoying a few videos from your collection).

Thanks again!

reply
stevenalowe
1 month ago
[-]
correction: the frequency/volume of SeatReserved events may suddenly spike
reply