If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.
It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.
I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.
By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.
No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.
This meant that it was costly to maintain and caused a lot of confusion, especially with internal dependencies (shared libraries): this is the trade-off they did not like and wanted to move away from.
They moved away from this in multiple steps, first one of those being making it a "distributed monolith" (as per your implied definition) by putting services in a monorepo and then making them use the same dependency versions (before finally making them a single service too).
> We no longer had to deploy 140+ services for a change to one of the shared libraries.
Taken in isolation, that is a strong indicator that they were indeed running a distributed monolith.
However, the blog post earlier on said that different microservices were using different versions of the library. If that was actually true, then they would never have to deploy all 140+ of their services in response to a single change in their shared library.
Your runtime version is out of date / end of life. You now need to update and deploy all 140 (or at least all the ones that use the same tech stack).
No matter how you slice it, there are always dependencies across all services because there are standards in the environment in which they operate, and there are always going to be situations where you have to redeploy everything or large swaths of things.
Microservices aren’t a panacea. They just let you delay the inevitable but there is gonna be a point where you’re forced to comply with a standard somewhere that changes in a way that services must be updated. A lot of teams use shared libraries for this functionality.
... otherwise you'd have to do something silly like update every service every time that library changed.
Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.
I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.
However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.
You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.
Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
OP is correct that you are indeed now in a weird hybrid monolith application where it’s deployed piecemeal but can’t really be deployed that way because of tightly coupled dependencies.
Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
Yes I understand it’s a shared library but if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong - that library should be published as a v2 or dependents should pin to a specific version. But having a shared library that has backward incompatible changes that is automatically vendored into all downstream dependencies is insane. You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service and the version that was published in the registry.
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling. If you have multiple teams working on a tightly coupled system you’re asking for trouble. This is why software projects inevitably decompose against team boundaries and you ship your org chart - communication and complexity is really hard to manage as the head count grows which is where loose coupling helps.
But this article isn’t about moving from federated codebases to a single monorepo as you propose. They used that as an intermediary step to then enable making it a single service. But the point is that making a single giant service is well studied and a problem. Had this constantly at Apple when I worked on CoreLocation where locationd was a single service that was responsible for so many things (GPS, time synchronization of Apple Watches, WiFi location, motion, etc) that there was an entire team managing the process of getting everything to work correctly within a single service and even still people constantly stepped on each other’s toes accidentally and caused builds that were not suitable. It was a mess and the team that should have identified it as a bottleneck in need of solving (ie splitting out separate loosely coupled services) instead just kept rearranging deck chairs.
> Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history
I’m not opposed to a monorepo which I think may be where your confusion is coming from. I’m suggesting slamming a bunch of microservices back together is a poorly thought out idea because you’ll still end up with a launch coordination bottleneck and rolling back 1 team’s work forces other teams to roll back as well. It’s great the person in charge got to write a ra ra blog post for their promo packet. Come talk to me in 3 years with actual on the ground engineers saying they are having no difficulty shipping a large tightly coupled monolithic service or that they haven’t had to build out a team to help architect a service where all the different teams can safely and correctly coexist. My point about the registry is that they took one problem - a shared library multiple services depend on through a registry depend on latest causing problems deploying - and nuked it from orbit using a monorepo (ok - this is fine and a good solution - I can be a fan of monorepos provided your infrastructure can make it work) and making a monolithic service (probably not a good idea that only sounds good when you’re looking for things to do).
But it is not! They were updating dependencies and deploying services separately, and this led to every one of 140 services using a different version of "shared-foo". This made it cumbersome, confusing and expensive to keep going (you want a new feature from shared-foo, you have to take all the other features unless you fork and cherrypick on top, which makes it a not shared-foo anymore).
The point is that true microservice approach will always lead to exactly this situation: a) you either do not extract shared functions and live with duplicate implementations, b) you enforce keeping your shared dependencies always on very-close-to-latest (which you can do with different strategies; monorepo is one that enables but does not require it) or c) you end up with a mess of versions being used by each individual service.
The most common middle ground is to insist on backwards compatibility in a shared-lib, but carrying that over 5+ years is... expensive. You can mix it with an "enforce update" approach ("no version older than 2 years can be used"), but all the problems are pretty evident and expected with any approach.
I'd always err on the side of having a capability to upgrade at once if needed, while keeping the ability to keep a single service on a pinned version. This is usually not too hard with any approach, though monorepo makes the first one appear easier (you edit one file, or multiple dep files in a single repo): but unless you can guarantee all services get replaced in a deployment at exactly the same moment — which you rarely can — or can accept short lived inconsistencies, deployment requires all services to be backwards compatible until they are all updated with either approach).
I'd also say that this is still not a move to a monolith, but to a Service-Oriented-Architecture that is not microservices (as microservices are also SOA): as usual, the middle ground is the sweet spot.
A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.
That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.
Internal Google services: *sweating profusely*
(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)
This is not true! This is one of the core strengths of protobuf. Non-destructive protobuf changes, such as adding new API methods or new fields, do not require clients to update. On the server-side you do need to handle the case when clients don't send you the new data--plus deal with the annoying "was this int64 actually set to 0 or is it just using the default?" problem--but as a whole you can absolutely independently update a protobuf, implement it on the server, and existing clients can keep on calling and be totally fine.
Now, that doesn't mean you can go crazy, as doing things like deleting fields, changing field numbering or renaming APIs will break clients, but this is just the reality of building distributed systems.
I mean I suppose you can make breaking changes to any API in any language, but that’s entirely on you.
Its a tradeoff
Assuming there is an update every week, you can expect all teams to update within two weeks, which means if everything goes well only two versions are active.
Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.
The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?
Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.
A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.
For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.
Are A and B truly independent under your vision ? or are they a company-spanning monolith ?
Mostly? If you can update A one week and B the next week with no breakage in between, that seems pretty independent.
> Over time, the versions of these shared libraries began to diverge across the different destination codebases.
There's at least one employee per micro service so there should be zero problems preventing just bumping the version of the library.
Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.
Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.
That is fundamentally incorrect. As presented in my other post you can correctly use the shared repository as a dependency and refer to a stable version vs a dynamic version which is where the problem is presented.
As long as the microservice owners are free to choose what dependencies to take and when to bump dependency versions, it’s fine - and microservice owners who take dependencies like that know that they are obliged to take security patch releases and need to plan for that. External library dependencies work like that and are absolutely fine for microservices to take.
The problem comes when you have a team in the company that owns a shared library, and where that team needs, in order to get their code into production, to prevail upon the various microservices that consume their code to bump versions and redeploy.
That is the path to a distributed monolith situation and one you want to avoid.
Yes, exactly. The point is not elitism. Microservices are a valuable tool for a very specific problem but what most people refer to as "microservices" are not. Language is important when designing systems. Microservices are not just a bunch of separately deployable things.
The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility. The service has a public interface defined in a contract that other components depend on, and that is it, what happens within the service is irrelevant to the rest of the system and vice verse, the service does not have depend on knowledge of the rest of the system. By virtue of being micro in responsibility, it can be deployed anywhere and anyhow.
If it is not a microservice, it is just a service, and when it is just a service, it is probably a part of a distributed monolith. And that is okay, a distributed monolith can be very valuable. The reason many people bristle at the mention of microservices is that they are often seen as an alternative to a monolith but they are not, it is a radically different architecture.
We must be precise in our language because if you or I build a system made up of "microservices" that aren't microservices, we're taking on all of the costs of microservices without any of the benefits. You can choose to drive to work, or take the bus, but you cannot choose to drive because it is the cheapest mode of transport or walk because it is the fastest. The costs and benefits are not independent.
The worst systems I have ever worked on were "microservices" with shared libraries. All of the costs of microservices (every call now involves a network) and none of the benefits (every service is dependent on the others). The architect of that system had read all about how great microservices are and understood it to mean separately deployable components.
There is no hierarchy of goodness, we are just in pursuit of the right tool or the job. A monolith, distributed monolith or a microservice architecture could be the right tool for one problem and the wrong tool for another.
The "micro" in microservice was a marketing term to distinguish it from the bad taste of particular SOA technology implementations in the 2000s. A similar type of activity as crypto being a "year 3000 technology."
The irony is it was the common state that "services" weren't part of a distributed monolith. Services which were too big were still separately deployable. When services became nothing but an HTTP interface over a database entity, that's when things became complicated via orchestration; orchestration previously done by a service... not done to a service.
I am talking about using a shared software repository as a dependency. Which is valid for a microservice. Taking said dependency does not turn a microservice into a monoloth.
It may be a build time dependency that you do in isolation in a completely unrelated microservice for the pure purpose of building and compiling your business microservice. It is still a dependency. You cannot avoid dependencies in software or life. As Carl Sagan said, to bake an apple pie from scratch, you must first invent the universe.
>The worst systems I have ever worked on were "microservices" with shared libraries.
Ok? How is this relevant to my point? I am only referring to the manner in which your microservice is referencing said libraries. Not the pros or cons of implementing or using shared libraries (e.g mycompany-specific-utils), common libraries (e.g apache-commons), or any software component for that matter
>Yes, exactly
So you're agreeing that there is no such thing as a microservice. If that's the case, then the term is pointless other than a description of an aspirational yet unattainable state. Which is my point exactly. For the purposes of the exercise described the software is a microservice.
True. However one of the core tenets of microservices is that they should be independently deployable[1][2].
If taking on such a shared dependency does not interfere with them being independently deployable then all is good and you still have a set of microservices.
However if that shared dependency couples the services so that if one needs a new version of the shared dependency then all do, well you suddenly those services are no longer microservices but a distributed monolith.
[1]: https://martinfowler.com/microservices/
[2]: https://www.oreilly.com/content/a-quick-and-simple-definitio...
There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change
By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.
We know for a fact that’s wrong via functional programming, state machines, and formal verification
For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.
In that example the monolith is much easier to work with.
The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.
I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.
Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.
It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).
Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.
If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.
Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.
Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.
The best is when you use containers and build against the latest runtimes in your pipelines so as to catch these issues early and always have the most up to date patches. If a service hasn’t been updated or deployed in a long time, you can just run another build and it will pull latest of whatever.
Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.
The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.
The problem has fundamentally gone away and reduced itself to a simple update problem, which itself is simpler because the update schedule is less frequent.
I use tomcat for all web applications. When tomcat updates I just need to bump the version number on one application and move on to the next. Tomcat does not involve itself in the data that is being transferred in a non-generic way so I can update whenever I want.
Since nothing blocks updates, the updates happen frequently which means no application is running on an ancient tomcat version.
And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.
To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."
This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.
Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.
I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.
If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.
This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.
Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.
There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.
People debate me on this but it’s an invariant.
That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.
I'm not saying monoliths are better then microservices.
I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.
If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.
I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.
This this is what I don't get about some comments in this thread. Choosing internal backwards compatibility for services managed by a team of three engineers doesn't make a lot of sense to me. You (should) have the organizational agility to make big changes quickly, not a lot of consensus building should be required.
For the S3 APIs? Sure, maintaining backwards compatibility on those makes sense.
If you’re using backwards compatibility as safety and that prevents you from doing a desired upgrade to an api that’s an entirely different thing. That is backwards compatibility as a restriction and a weakness in the overall paradigm while the other is backwards compatibility as a feature. Completely orthogonal actions imo.
Scale
Both in people, and in "how do we make this service handle the load". A monolith is easy if you have few developers and not a lot of load.
With more developers it gets hard as they start affecting eachother across this monolith.
With more load it gets difficult as the usage profile of a backend server becomes very varied and performance issues hard to even find. What looks like a performance loss in one area might just be another unrelated part of the monolith eating your resources,
But everyone should know that microservices are more complex systems and harder to deal with and a bunch of safety and correctness issues that come with it as well.
The problem here is not many people know this. Some people think going to microservices makes your code better, which I’m clearly saying here you give up safety and correctness as a result)
This is much less of a problem than it seems.
You can use a serialisation format that allows easy backward compatible additions. The new service that has a new feature adds a field for it. The old client, responsibly coded, gracefully ignores the field it doesn't understand.
You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
Yeah it’s roundabout solution to create something to deploy two things simultaneously. Agreed.
> Putting the code into a monorepo does nothing to fix this.
It helps mitigate the issue somewhat. If it was a polyrepo you suffer from an identical problem with the type checker or the integration test. The checkers basically need all services to be at the same version to do a full and valid check so if you have different teams and different repos the checkers will never know if team A made a breaking change that will effect team B because the integration test and type checker can’t stretch to another repo. Even if it could stretch to another repo you would need to do a “simultaneous” merge… in a sense polyrepos suffer from the same issue as microservices on the CI verification layer.
So if you have micro services and you have a polyrepos you are suffering from a twofold problem. Your static checks and integration tests are never correct and always either failing and preventing you from merging or deliberately crippled so as to not validate things across repos. At the same time your deploys will also guarantee to be broken if a breaking api change is made. You literally give up safety in testing, safety in type checking and working deploys by going microservices and polyrepos.
Like you said it an be fixed with backward comparability but that’s a bad thing to restrict your code to be that way.
> This is much less of a problem than it seems.
It is not “much less of a problem then it seems” because big companies have developed methods to do this. See Netflix. If they took the time to develop a solution it means it’s not a trivial issue.
Additionally are you aware of any api issues in communication between your local code in a single app? Do you have any problems with this such that you are aware of it and come up with ways to deal with it? No. In a monolith the problem is nonexistent and it doesn’t even register. You are not aware this problem exists until you move to micro-services. That’s the difference here.
> You can use a serialisation format that allows easy backward compatible additions.
Mentioned a dozen times in this thread. Backwards compatibility is a bad thing. It’s a restriction that freezes all technical debt into your code. Imagine python 3 stayed backward compatible with 2 or the current version of macOS was still compatible with binaries from the first Mac.
> You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
Can you honestly tell me this is a good thing? The fact that you have to pay attention to this in microservices while in a monolith you don’t even need to be aware there’s an issue tells you all you need to know. You’re just coming up with behavioral work around and coping mechanisms to make microservices work in this area. You’re right it does work. But it’s a worse solution for this problem then monoliths which doesn’t have these work arounds because these problems don’t exist in monoliths.
> If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
It’s only very rare in microservices because it’s weaker. You deliberately make it rare because of this problem. Is it rare to change a type in a monolith? No. Happens on the regular. See the problem. You’re not realizing but everything you’re being up is behavioral actions to cope with what is fundamentally weaker in microservices.
Let me conclude to say that there are many reasons why microservices are picked over monoliths. But what we are talking about here is definitively worse. Once you go microservices you are giving up safety and correctness and replacing it with work arounds. There is no trade off for this problem it is a logical consequence of using microservices.
Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.
> The logical outcome of this is virtually identical to a distributed monolith.
Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
[0] I have seen this done. It was as crazy as it sounds.
Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.
Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.
>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.
I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.
I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.
>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.
The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
No certainly possible. You can evolve linux, macos and windows forever without any breaking changes and keep all apis backward compatible for all time. Keep going forever and ever and ever. But you see there's a huge downside to this right? This downside becomes more and more magnified as time goes on. In the early stages it's fine. And it's not like this growing problem will stop everything in it's tracks. I've seen organizations hobble along forever with increasing tech debt that keeps increasing for decades.
The downside won't kill an organization. I'm just saying there is a way that is better.
>I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I have as well. It doesn't mean it doesn't work and can't be done. For example typescript is better than javascript. But you can still build a huge organization around javascript. What I'm saying here is one is intrinsically better than the other but that doesn't mean you can't build something on technology or architectures that are inferior.
And also I want to say I'm not saying monoliths are better than microservices. I'm saying for this one aspect monoliths are definitively better. There is no tradeoff for this aspect of the debate.
>I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
Didn't a break happen recently? Barring that... There's behavioral ways to mitigate this right? like what you mentioned... backward compatible apis always. But it's better to set up your system such that the problem just doesn't exist Period... rather then to set up ways to deal with the problem.
This is correct.
> Neither of these scenarios is practical.
This is not. When you choose appropriate tools (protobuf being an example), it is extremely easy to make a non-breaking change to the communication channel, and it is also extremely easy to prevent breaking changes from being made ever.
Protobuf works best if you have a monorepo. If each of your services lives within it's own repo then upgrades to one repo can be merged onto the main branch that potentially breaks things in other repos. Protobuf cannot check for this.
Second the other safety check protobuf uses is backwards compatibility. But that's a arbitrary restriction right? It's better to not even have to worry about backwards compatability at all then it is to maintain it.
Categorically these problems don't even exist in the monolith world. I'm not taking a side in the monolith vs. microservices debate. All I'm saying is for this aspect monoliths are categorically better.
No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.
2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.
Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.
>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.
>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.
Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"
Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.
I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.
Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.
This is simply dependency hell exploding with microservices.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $foo after SHA before 2025-12-14 at 12:00 UTC.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $bar after SHA before 2025-12-14 at 12:00 UTC.
...
It's never ending. You get a half dozen of these on each on call rotation.
In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.
In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!
We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.
The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.
Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.
I hope this helps.
Given the quality and the structure neither approach really matters much. The root problems are elsewhere.
> I made a passionate case...
My experience is that it is less about passion and more about reason.There's a lot of good research and writing on this topic. This paper, in particular has been really helpful for my cause: https://dl.acm.org/doi/pdf/10.1145/3593856.3595909
It has a lot going for it: 1) it's from Google, 2) it's easy to read and digest, 3) it makes a really clear case for monoliths.
> You can make enemies easily...
Short term, definitely. In the long tail? If you are right more than you are wrong, then that manifests as respect.You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.
It immediately triggered my - is this AI?
It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.
1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "
2. "Parkinson's law": "Work expands to fill the available time".
So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).
Incompetent + driven is the worst combination there can be.
Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.
Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.
There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).
As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.
It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)
I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.
But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.
I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.
(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.
A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.
* You can use gazelle to auto-generate Bazel rules across many modules - I think the most up to date usage guide is https://github.com/bazel-contrib/rules_go/blob/master/docs/g....
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.
What i learned? Both approaches have pros and cons.
With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.
That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.
On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.
If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.
Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.
If your software solves a single business problem, it probably belongs in a single (still micro!) service under the theory underlying microservices, in which the "micro" is defined in business terms.
If you are building services at a lower level than that, they aren't microservices (they may be nanoservices.)
These problems are effectively solved on beam, the jvm, rust, go, etc.
(and I say that as someone who caught themselves doing the same: severless is really good at hiding this.)
Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.
Next comeback I see is away from React and SPAs as view transitions become more common.
We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".
> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.
You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.
We solved it by simply giving every team their own dev environment (before merging to main). So if tests break in feature #1, it doesn't break anything for feature #2 or team #2. It's all confined to their environment. It's just an additional VM + a config in CI/CD. The only downside of this is that if there are conflicts between features they won't be caught immediately (only after one of the teams finally merges to main). But in our case it wasn't a problem because different teams rarely worked on the same parts of the monorepo at the same time due to explicit code ownership.
Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.
Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.
However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.
Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.
It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.
This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.
In the parent SOA(v2), what they described is a well known anti-pattern: [0]
Application Silos to SOA Silos
* Doing SOA right is not just about technology. It also requires optimal cross-team communications.
Web Service Sprawl
* Create services only where and when they are needed. Target areas of greatest ROI, and avoid the service sprawl headache.
If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.
And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.
But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.
[0] https://www.oracle.com/technetwork/topics/entarch/oea-soa-an... [1] https://microservices.io/post/architecture/2022/05/04/micros...
Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.
I agree that patterns should be used where most appropriate, instead of blindly.
What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.
Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.
I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.
Churn is kind of a loaded word, I'd just call it change. Improvements, efficiencies, additions and yes, of course, fixes.
It may be a little unfair to compare monoliths with distributed services when it comes to deployments. I often deploy three services (sometimes more) to implement a new feature, and that wouldn't be the case with a monolith. So 100% there is a lower number of deploys needed in that world (I know, I've been there). Unfortunately, there is also a natural friction that prevents deploying things as they become available. Google called that latency out in DORA for a reason.
Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.
You Want Microservices, But Do You Really Need Them? https://www.docker.com/blog/do-you-really-need-microservices...
Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.
At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.
In short: No matter what you do the first time around... it's wrong.
Choose one.
TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.
In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.
IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.
That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?
But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.
My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)
People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.
Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).
Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.
Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.
Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”
Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.
The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.
Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."
-DHH
The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.
Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.
Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.
Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.
One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).
Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.
There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.
Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.
Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).
Gonna stop you right there.
Microservices have nothing to do with the hosting or operating architecture.
Martin Fowler who formalized the term, Microservices are:
“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”
You can have an entirely local application built on the “microservice architectural style.”
Saying they are “often HTTP and API” is besides the point.
The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes
Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.
Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.
The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."
Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.
PS The “single application” line shows how dated Fowlers view were then and certainly are today.
Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.
There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script
We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays
A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.
--
The "micro" of microservices has always been ridiculous.
If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.
yes, networking is the bottleneck between the processes, while one machine is the bottleneck to end users
You can run your monolith on multiple machines and round-robin end-user requests between them. Your state is in the DB anyway.
Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.
If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.