> Moving to a monorepo didn't change much, and what minor changes it made have been positive.
I'm not sure that this statement in the summary jives with this statement from the next section: > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
>
> Now, it is a single one. Easy to review, easy to merge, easy to revert.
IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.
This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.
What instead, then? Explicit GitOps. Explicit, reified release specifications (think k8s resource manifests, or Erlang .relup files), one per separately-deploy-cadenced component. If you have a monorepo, then these live also as a dir in the monorepo. CD happens only when these files change.
With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do. But there can also be separate PRs for updating the code vs. "integrating and deploying changes" to a component, if-and-when that is sensible.
How do you avoid the chicken-and-egg problem? Like if the k8s manifest contains a container tag, and the tag is created by CI when the PR is merged to main, it would seem you can’t add code and deploy that code in the same PR.
- PR creation (or any update to the PR branch by a human) would trigger a CI workflow to build+push the container;
- if this succeeds, the workflow would push a commit to the PR feature branch that pins the container ref;
- and the base branch would have a branch protection rule that makes the success of this CI workflow a prerequisite for PR mergeability.
In the explicit-GitOps-in-monorepo case, you'd probably want to do this by having the PR author modify the release spec file of each component they want to build a release for, replacing the old static ref with a temporary symbolic ref meaning "hey CI system, calculate this." Then the CI's added commit would rewrite those temporary symbolic refs into new static ones.
---
Although, that's all assuming that you even want to do CI builds in the first place. If you're developing containerized server software — rather than an operating system, a web browser, etc — then it shouldn't matter where you're building, and there isn't much impetus for deterministic, auditable builds, either. So why bottleneck builds by shoving them all onto some wimpy CI system—when every engineer already has a powerful workstation that can do builds often 4x faster, sitting there idle?
Here's what I call "local-first explicit GitOps":
1. you give your engineers the ability to push refs to the container registry (but you never use symbolic refs, always static manifest SHAs, so a malicious engineer can't do anything dangerous with this.
2. you add a script to your monorepo, that an engineer can run against their feature branch, that'll notice their current branch's local symbolic refs, and build+push containers locally in order to rewrite those into static refs.
3. On success, the script can use `gh` (or equivalent) to trigger PR creation.
Now every PR will come into existence already pinning valid new build artifacts by their static refs!
The same semantic version string is in pyproject.toml and values.yaml. These are enforced to be the same via pre-commit hook. Environment-specific versions are in env-specific values files like values-prod.yaml.
I have a build workflow which triggers on PRs that change certain files, computes either a pre- or post-release tag by appending the git hash to the server tag, and pushes the images and model repo using that tag. This will re-trigger as the PR is updated.
Users deploy these pre/post-releases in a few clicks with a manual github action which updates the branch and tag in our dev cluster Argo.
The PR can’t be merged unless the version is bumped.
When the PR is merged, a “promote” workflow runs which re-tags the latest image and model repo with the release version (pre-release without git hash suffix). This is fast, which means there’s no risk of image backoff loop from Argo trying to sync before artifacts are available.
Most PRs don’t touch the version in values-prod.yaml, so merges to main auto-deploy to staging while prod requires another PR updating only values-prod.yaml.
Folks complain a bit about manually bumping the version, and needing 2 PRs to take a change to prod, but it’s pretty smooth otherwise!
For example: build code and image, update git reference and version number in Application when build done.
You then get atomic updates of both code and configuration.
This brings us to the elephant in the room: monorepo strategies are just naive and futile attempts at fixing the problems created by an absolute lack of support for versioning.
If you support versioning, you don't care where and how you store your code. You just push the changes you need to push, and make sure you bump versions to reflect the nature of your changes. Consumers are left with the responsibility of picking which to consume, and when to consume it.
You can't always allow support for V1 of some API to continue because it's just "wrong" in the light of your work. It can turn out that your "micro" architecture just didn't really work because factors X or Y tie certain groups of services together so much that a change to one forces changes to the others.
So redesign right? Well, the contractors are all gone and your small team can barely manage the bugs and minor features, let alone redesigns which tend to force huge changes in the entire system because it wasn't thought out that well to start with. i.e. the mountain to climb to fix a little problem is as high as fixing all the problems.
So you might have to update everything that references your changing API to use the new version....but your company has tons of repos and you don't know which ones reference your API, and they're owned by other teams and there's no central set of tests which have been run against the combination of the head of all these repos to ensure that they do in fact work together at exactly the state the combination is at when you are working.
All deployments are a guess - which repos do you need to deploy for change X? Some other team deploys their repo without realising that it now depends on a new version of yours. Oh dear. We should have created a versioning mechanism of some kind and made their deployment depend on ours......but no time to do that kind of thing and could you get other people to agree to use it and how does it work with your CI system and.........well....you just muddle on, force yourself to remember the details, make mistakes etc etc. Busy work.
IMO monorepos make a good place to put a set of whole-system end to end tests that everyone can run before they merge. That's going to sort out many conflicts before they turn into a bug hunt.
That's how versioning works.
- You allow legacy applications to consume the legacy interface,
- you provide a new interface to the world so that they have a path forward to migrate out of consuming the legacy service through the legacy interface,
- the consumers migrate in a time frame that makes sense to them.
- in the end you sunset the legacy API, and pull the plug.
This scenario is not eliminated by monorepos.
What monorepos do is force consumers to migrate to the new interface as an urgent task that's part of the critical path. If you have resources to do that migration, it does not matter how you store your code.
IMO versioning is something you do when you cannot do the work of upgrading yourself - it's a response you take which is non-optimal and you do it when it's unavoidable.
It doesn't. It just does not force everyone to rush both clients and services to prod without having a fallback plan. Also dubbed as competent engineering.
> We have to be able to make changes on the clients of our code because the teams responsible for them are busy achieving their goals.
Yes. That's what you want, isn't it? Otherwise as a service maintainer you'd be blocked for no reason at all.
And let's not even touch the topic of rolling back changes.
There is no way around it. Once you start to think about any of the rationale behind this monorepo nonsense, you soon realize it's a huge mess: lots of easily avoidable problems created by yourself for yourself, that otherwise would be easy to avoid.
To put the opposite view: Versioning could be done with individual functions in the code for example but we don't do that - we just update all the places that call it.
We usually start to do versioning where there are boundaries - such as with components that we bring in from external sources or with projects that are maintained by separate teams.
A version is usually a response a situation where you cannot update the clients that use the api yourself.
So monorepos can be seen as a way of just saying "actually I can update a lot of the points-of-use of this API and I should instead of creating a new version and waiting for someone else to finally use it."
I think you're a bit confused. There are code repositories, and there are units of deployment. Those are not the same things.
Storing multiple projects in a single repository does not magically make them a single unit of deployment, even when you deploy them all with a single pipeline. When you have multiple projects, you always have multiple units of deployment.
In your example, you focused on a scenario that does not apply: individual functions that are a part of the same unit of deployment. Your example breaks down when you pack your hypothetical function in separate modules that you deploy and consume independently. Even if you do not explicitly tag a version ID to a package, implicitly your module has different releases with different versions of your code delivered at different points in time. If one of these deliveries has a breaking change them your code breaks. Explicitly specifying a version ID, such as adding git modules pointing at a specific commit, is a technique to preserve compatibility.
Where it is very obvious your example fails is when you look at scenarios involving distributed applications with independent units of deployment. This means things like a SPA consuming a backend service, or even a set of producers and consumers. Even if they are all deployed by the same pipeline, you either have forced downtime or you will always have instances of different versions running in parallel.
The SPA+backend is a rather obvious example: even if you do a deployment of a backend and frontend at the exact same time as an atomic transaction that ensures both are available at the precise same tick, don't you still have users with the browsers loaded with instances of the old SPA? They will continue to have it open until they hit F5, aren't they? If you released a breaking change to the backend, what do you think will happen to users still using the old SPA? Things will break, won't they?
Atomic deployments do not exist, however. Looking at services, you have no way to ensure you can deploy new versions of services at precisely the same tick. This means even with monorepos you will always have different deployments of those services running in parallel. Monorepo proponents fool themselves into believing this is not a problem because they count on experiencing problems at best only briefly during deployments, and that the system eventually reaches steady state. This means things like erratic responses, distributed transactions failing, perhaps only a few corrupt records going into the db, etc. If everyone pretends these problems are normal then there is no problem to fix.
Except this negates each and every single hypothetical advantages of a monorepo, and rejects any argument supporting it. Once you realize you're not actually eliminating problems and you are only buying forced downtime, no matter how small, and taking that hit out of willful ignorance. As a tradeoff, you're buying yourself operational problems and lack of flexibility due to the misuse of revision control systems.
And all for what? Because someone heard Google uses monorepos?
It makes it easier to turn them into the same unit of deployment. There's nothing you cannot do some other way of course.
You're right about atomic deployments being difficult and sometimes one can control that risk by the order in which you change things. In a monorepo it's slightly easier to record some kind of script, makefile, dependency system that says "deploy this before that".
With browsers -for sure your user level API has to be stable and cannot change in a sudden incompatible way. When people have seen fit to have layers of APIs underneath it though, one can still have a lot of change thats theoretically hidden from users but still changes lots of APIs.
They are not the same unit of deployment. That's an impossibility.
This critical mistake is at the core of this monorepo nonsense. It's a cargo cult, where people believe that storing code for multiple projects in the same source code revision contol system somehow magically turns distributed systems into a monolith and solves deployment issues. It does not.
> You're right about atomic deployments being difficult and sometimes one can control that risk by the order in which you change things.
No. That is false. Atomicity in a distributed transaction is not achieved by shuffling operations around. Specially those you cannot control.
Not really.
From the producer side it only requires that a) you do not break your contract, b) when you need to break a contract, you bump your version number, c) when no one uses older versions, remove the code.
From the consumer, it's even easier: you just need to point to the version you're consuming.
What exactly do you think is the problem?
That pretty much makes all internal APIs and code like Public APIs as a product, which adds a huge costs in terms of initial API designs, assumption validations, because you can't make any mistakes at all. And later down the road it fixes you into a place where if new requirements arise, and they will, you won't be able to pivot or refactor anything.
Literally having this version setup and requirements changing I've seen estimates of what would be a week in a monorepo refactor either being estimated to take a year+ or simply impossible because of so many assumptions of those contracts. And there's such a nasty spaghetti of dependencies with different versions that no one can actually understand or hold in their head, and basically to build certain new features, they would have to start from scratch.
No. Different versions are not different products. They are the exact same product, provided in a way that delivers features in a way that won't break clients.
> (...) adds a huge costs in terms of initial API designs (...)
Obviously not. From the producer side, all it takes to support versioning is picking up your versioning strategy and sticking with it. This can mean providing an endpoint through a different path parameter, reading a request header, or even get the version through a query parameter.
Does it take you much design work to add a "v1" in a URL path? How about reading a request header? Does passing a query parameter require meetings to review designs?
Some frameworks even explicitly support versioning as a high-level concept and make it a part of their routing strategies. Copy the source file of a controller to another folder, change it's namespace to avoid conflicts, add a tag to the class, done. You just released a new version of your API.
> (...) assumption validations, because you can't make any mistakes at all.
You somehow managed to get it entirely backwards. With versioning, you can afford making all kinds of mistakes all the time. Breaking interfaces is no longer w concern. You release your changes to an upstream dependency/producer with zero use, downstream dependencies/consumers can gradually switch as their test plans allow, and if any problem is detected then this can be worked out before either consumers go live or producers receive production traffic.
More importantly, you have your old stable versions ready to roll back.
Knowing when no one uses older versions is also hard (especially when you can no longer grep for them, unlike a monorepo) so some people get into defensive mode and never delete or fix any interfaces because "someone might still be using it".
No, you just pin your app to v3.2.4 until a working update is made (either to the library or to your own app). I'm not sure why that suddenly becomes an impossible task once the creators of the app and the library work for the same parent company.
> Knowing when no one uses older versions is also hard
That's the thing though - you don't have to care. If you delete an interface, just bump the version number accordingly. People who still need the interface will stay on the old version. (And - bonus - they are now placed under implicit pressure to refactor their code away from using that old interface, now that they know it's deprecated and will no longer be maintained. By contrast, if you have a practice of keeping code around code just because people are still using it, that code tends to just stick around forever.)
External libraries have to be versioned because their code is not part of your monorepo. If the whole world would be pushing in the same monorepo, while not breaking any functionality, we would not need version numbers.
I don't see what point you tried to make. So A breaks when someone breaks B. That's ok. You spot the error the moment you try to update A, well before you commit any code. You file a low-priority ticket and go on with your life because A is still good in production.
> Knowing when no one uses older versions is also hard (...)
Not really, it's the easiest part of the whole thing and a non-issue. If you manage a package this does not even register as a problem. If it's a service you have request metrics.
What is the problem, really?
What inevitably happens is that A hasn't been updating their version of B in weeks/months, but some recently landed bug fix in B is now needed RightAway(tm). And now if something accidental breakage happened between now and weeks/months ago, it'll be much more annoying to triage.
> Not really, it's the easiest part of the whole thing and a non-issue. If you manage a package this does not even register as a problem.
Well, where I'm at we have semi-regularly teams we have never heard of using our libraries. How can I know what things such people are using?
A monorepo gives you less control. If you're unable to enforce control through other means, sure use a monorepo. But it's limiting.
So, like never?
That's the real point of a monorepo: you can treat the whole thing as internal, break it to your heart's content. You only need to make sure you don't break the user-facing part, for some subjective value of break.
There is no dependency hell. Your consumers allocate work effort to migrate to the latest version. This is work you would still have to do with a monorepo.
In my experience, with separate repositories, packages and teams, I think it's a huge challenge to get e.g. 8 different teams together and then to agree to migrate, since everyone's roadmaps are probably full for the following 4 quarters already.
Even with a few services, we saw some pretty crunchy issues with people not understanding that service A had version 1.3.1234 of a module and Service B had version 1.3.1245 and that was enough skew to cause problems.
Distinct repos tend to have distinct builds, and sooner or later one of the ten you're building will glitch out and have to be run twice, or the trigger will fail and it won't build at all until the subsequent merge, and having numbers that are close results in a false sense of confidence.
Of course k8s not required, you can do it in straight IaC etc (i.e deploy a whole parallel system and switch).
It's still "mixed fleet" in terms of any shared external resources (queues, db state, etc) but you can change service interfaces etc with impunity and not worry about compatibility / versioning between services.
Throwing temporary compute at the problem can save a lot of busywork and/or thinking about integration problems.
This stops being practical if you get _very_ big but at that point you presumably have more money and engineers to throw at the problem.
That's overall a bad idea, and negates the whole point of blue-green deployments. This is particularly bad if you have in place any form of distributed transaction.
There are very good reasons why deployment strategies such as rolling deployments and one-box deployments were developed. You need to be able to gradually roll out a change to prevent problems from escalating and causing downtime. If you use all that infrastructure to then flip global switches then you're building up all this infrastructure to the negate any of that with practices that invariably cause problems.
And this is being done just because someone thinks it's a good idea to keep all code in a single repo?
From my perspective I prefer it to rolling updates, it just costs a lot of resources. It gives you a chance to verify all the parts of your system together during release.
You deploy an entire new parallel copy of your system and test it. This can be manual or automated. Once you have confidence, you can flip traffic. Alternatively you can siphon off a slice of production traffic and monitor metrics as you gradually shift traffic over to the new release. Note that you can also nearly "instantly" roll back at any point, unlike with a rolling update. Again, this is only "sort of" a mixed fleet situation because you can maintain the invariant that service A at v1.23 only talks to service B at v1.23. This means you can atomically refactor the interface between two services and deploy them without thinking too hard about what happens when versions are mixed.
Distributed transactions I'm not so sure about, we would have to talk specific situations.
> And this is being done just because someone thinks it's a good idea to keep all code in a single repo?
It's more like, if you have a monorepo this is one option that becomes simpler for you to do, should you happen to want to do it.
Having a centralized configuration library (a shared Makefile that we can pull down into our repo and include into the local Makefile) helps, until you have to make a backwards-incompatible change to that Makefile and then post PRs to every branch of every repo that uses that Makefile.
Now we have almost the entirety of our projects back into one repository and everything is simpler; one PR per release branch, three PRs (typically) for any change that needs backporting. Vastly simpler process and much less room for error.
As soon as you split 1 repo into 2 repos you need to start building tooling to support your 2 repos. If your infrastructure is sufficiently robust with 2 repos then you might as well have 3 or 4 or 10. If it's built to _only_ support 2 repos (or 3 or 4) then it's brittle out of the gate.
The value of a monorepo is that you completely eliminate certain classes of problems and take on other classes of problems. Classic trade off. Folks that prefer monorepos take the position that multirepo problems are much harder than monorepo problems most of the time.
No, not really.
If you're talking about projects for modules and components, all you need is a versioning strategy and release consumable packages of your projects.
If you're talking about services, all you need to do is support versioned APIs and preserve your contracts.
No tooling required. For projects you can even make do with git submodules. For services, all you need to do is update clients of your downstream dependencies.
What problems are you facing, exactly?
If you aren't using a monorepo, you need some versioning process, as well as procedural systems in place to ensure that everyone's dependencies, stay reasonably up to date. Otherwise, you end up deferring pain in really unwanted ways, and require sudden, unwanted upgrades through api incompatibility due to external pressure.
This also has the downside of allowing api-owning teams to make changes willy-nilly and break backwards compatibility because they can just do it behind SemVer, and then clients of the api need to own the process of migrating to the new version.
A monorepo fixes both of these: you cannot get out of sync, so it is the api-owning team's responsibility to upgrade clients, since they can't break the API otherwise. Similarly, you get a versioning process for free, and clients can never be using out of date or out of support versions of a dependency.
Services work approximately the same either way, since you can't assume synchronous upgrades across service/rpc boundaries anyway.
You can have as many build targets in a monorepo as you like. You can also have large monoliths in a monorepo.
The fundamental mismatch is the feature/app code will have longer testing times, stuff like Puppeteer and creating-infra-per-MR that just fundamentally takes a long time to run. But in ops, you need configuration to roll out quickly, maybe run an autoformatter and a linter or two beforehand and that's it. When you want to rollback, you don't need to wait for all your tests to run.
Yes you need to deal with versioning. You can just use automatic timestamps. You can write easy automation to push new timestamps to the ops/config repository when there are new releases in the app repository. The ops repository has minimal configuration to pull the latest version of the app repository and apply it, where the app repository includes infra and deployment scripts themselves.
What are the solutions? You would need multiple merge queues.
If you try to come up with fancy rules like forbidding changes to infra, config, and code in the same commit, then you're just cramming a polyrepo design into a monorepo, with much more fragility, because commits get rebased and squashed and etc.
Wrote this some years ago: https://dev.to/kgunnerud/our-experience-monorepo-with-java-m...
Nobody in our team wants to go back to nonmonorepo now, although everone was sceptical initially
Primary monorepo – single versioned packages for libraries and services that are deployed as a whole.
Secondary monorepo – individually versioned packages for shared libraries and independent (micro)services.
You only need to coordinate this sort of changes if you have absolutely no versioning strategy in place and accept breaking contracts for no reason at all.
If you have any form of versioning in place, you do not need to push multi-project commits. You update upstream packages and services, roll them out, and proceed to update the projects that consume them to gradually point them to the newly released versions.
If you're consuming a module from a git as a submodule, yes.
This scenario only happens if you're working with a distributed monolith, you care nothing about breaking existing APIs, and you own all producers and consumers.
When this happens, your problem is obviously not how you maintain projects independently in separate repos. Your problems is how you're failing to preserve API contracts, and instead you go to the extreme of insisting in breaking all contracts at once.
It's also a huge red flag when you talk about "accidentally end up with one repo in this branch". Don't you know what you are pushing to production?
As always, monorepo talks are just a thin veil of optimization placed over a huge mess of major operational and technical problems.
Preserving contracts has a significant cost. Why would you try to preserve contracts if you're in fact in control of all producers and consumers?
Not really. You just need to break out a new version when you put out a breaking change. This can mean anything between duplicating the code that implements your interface to then apply a change and checking a version flag when invoking an operation to determine which version to run.
Do you struggle to not break your code? To me it's like the main responsibility of any developer, isn't it?
> Why would you try to preserve contracts if you're in fact in control of all producers and consumers?
Because you clearly aren't in control, if you feel compelled to dump anything and everything in the same repository to try to prevent things from breaking.
Also, as others have pointed out, storing code in the same repository gives you no assurance that all your services will be deployed at the same exact moment in time. Consider for example multi-region deployments, or even deployments to multiple availability zones. Those aren't atomic. Why does anyone believe that storing all code in the same repository is enough to make these operations atomic?
Not isolated to this particular issue. People really like to pretend that problems are much simpler than they actually are and that inherent complexity is just some edge cases to be ironed out.
On one end you have Netflix-like architectures, on the other end you have intranet apps used twice monthly by two users. There's a wide range of needs and expectations in between.
No one is saying that. But there are lots of types of shared code that can be updated atomically, so why not update them atomically? And there's nothing inherent in being a monorepo that prevents you from being careful about the non-atomic updates.
It's really easy to accidentally deploy multiple repos in a way that gets the deployment order wrong. You have to be careful with non-atomic deployments whatever type of repo structure you have.
Yes, but my time is limited and there's the opportunity cost.
It also means more code, more tests for future. More code, ceteris paribus, represents more maintenance costs for the future.
> Because you clearly aren't in control, if you feel compelled to dump anything and everything in the same repository to try to prevent things from breaking.
I can prevent things from breaking in multi-repo, multi-versioned architecture, it's just more expensive to do so.
Of course I’m “breaking all contracts at once”, for most of them were never a contract to begin with, and the monorepo effectively discourages people from assuming random behaviors are in fact contracts, and that preserves my sanity and the sanity of the code base that now doesn’t suffer from the combinatorial explosion in complexity that happens when people want everything to be an API contract.
Yes, they are. They are as free as adding a new endpoint.
> and for things that were not decided to be an API and or a contract an absolute waste of time.
You're talking about integrating multiple projects. There are always contract. You simply chose to be ignorant of that fact and fail to interpret breaking those contracts as the root cause of all your problems.
You point out anyone lauding monorepos as some kind of solution to any problem and I assure you that we can immediately isolate one or more problems caused by versioning and breaking contracts.
The only reason why we see this nonsense paraded in web services is that they are loosely coupled. When people worked mainly with modules and libraries, compiler and linking errors made this sort of error very obvious. Now people chose to unlearn facts and want to call that operational improvements.
This reminds me a lot of the dynamic vs. static typing discussion. Even at a function level, there's always a contract, no other option. The only decision you can make is whether it should be well documented, compiler verifiable and explicit, or not.
That is, if you’re not tied to using just Github of course.
Big monorepos and multiple repo solutions require some tooling to deal with scaling issues.
What surprises me is the attitude that monorepos are the right solution to these challenges. For some projects it makes sense yes, but it’s clear to me that we should have a solution that allows repositories to be composed/combined in elegant ways. Multi-repository pull requests should be a first class feature of any serious source code management system. If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.
> Multi-repository pull requests should be a first class feature of any serious source code management system.
But it's currently not? > If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.
It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.Not in my experience. I use "git blame" all the time, and routinely read through commits from many years ago in order to understand why a particular method works the way it does.
Luckily, there are many tools for merging git repos into each other while preserving history. It's not as simple as copy and paste, but it's worth the extra efford.
Do you have examples of source code management systems that provide this feature and do you have experience with them? repo-centric approach of GitHub often feels limiting.
Now, though, it seems the pendulum has swung in the opposite direction, from “everything in its own repo” to “everything in one repo.” This, too, will create its own set of problems, which also can be comedic, but frustrating to experience. For instance, what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream? Coordinating that with 50 developers spread across 8 projects, all in a single repo.
Instead we could also face the problems we currently face and start out wirn a balanced approach. Start with one repository, or split frontend and backend if needed. For data pipelines that share models with the API, keep them in the same repository, creating a single source of truth for the data model. This method has often led to other developers telling me about the supposed benefits of “everything in its own repo.” Just as I pushed back then, I feel the need to push back now against the monorepo trend.
The same can be said for monoliths and microservices, where the middle ground is often overlooked in discussions about best practices.
They all reminded me of the concept of “no silver bullet”[0]. Any decision will face its own unique challenges. But silver bullet solution can create artificial challenges that are wasteful, painful, and most of all unnecessary.
The correct approach here is typically to invalidate the certificate or API key. A force push usually doesn't work.
If you're using GitHub, the dangerous commit lives on effectively forever in an awkward "not in a repository" state. Even if you're not on GitHub and your system actually garbage collects, the repo has been cloned onto enough build machines and dev machines that you're better off just treating the key or cert as compromised than trying to track down all the places where it might have been copied.
You’re correct about keys/certs once uploaded, they should be treated as compromised, especially when the repository isn’t self-hosted. However, replacing API keys and certificates can take time, and within a large corporation, it could take months.
Force pushes occur for various reasons. Sensitive data includes customer and employee personal info.
Other cases involve logs, DB backups, cache, PDFs, configs, and binary files.
Maintenance and performance needs form another category.
Team dynamics and varying git expertise can lead to accidental or uninformed force pushes, especially challenging in a monorepo with 50+ contributors.
In summary, reasons range from personal data (GDPR compliance), security requirements for logs/configs, to resolving merge conflicts and performance issues.
Your question regarding the need to communicate between 50 or more devs, there was no need but monorepo idea forces unnecessary communication and team effort when none was needed if there would be more repositories.
Your software artifacts will have loose coupling if you divided them well enough on their creation. As soon as they are created, you can't do anything else to change it, except for joining or splitting them.
We went from monorepo to multi-repo at work and it's been a huge set back and disappointment with the devs because it's what our contractors recommended.
I've asked for a code deploy and everything and it's failed in prod due to a missing check in
> ...because it's what our contractors recommended
It's sad when this happens instead of taking input from the team on how to actually improve productivity/quality.A startup I joined started with a multi-repo because the senior team came from a FAANG where this was common practice to have multiple services and a repo for each service.
Problem was that it was a startup with one team of 6 devs and each of the pieces was connected by REST APIs. So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.
I refactored the whole thing in one weekend into a monorepo , collapsed the handful of services into one service, and we never looked back.
That refactoring and a later paper out of Google actually inspired me to write this article as a practical guide to building a "modular monolith": https://chrlschn.dev/blog/2024/01/a-practical-guide-to-modul...
If you just have 20-30 devs and everyone is pretty silo'd (e.g. frontend or backend, data or API, etc) having 75 repos for your stuff is just silly.
As someone who worked in the dev tooling org, getting teams to keep their deps up to date was a nightmare.
Then you are adapting your project to the properties of code repository. I don't see that as a benefit.
A github search for 'operator' in the openshift org has 178 results:
https://github.com/orgs/openshift/repositories?language=&q=o...
Not all are repos hosting one or more microservices, but most appear to be. Best of luck ensuring consistency and quality across so many repos.
Enforcing standards means actually having org level mandates around acceptable development standards, and it's enforced using tools. Those tools should be just as easily run on one monorepo than 50+ distributed repositories, nay?
If you wanted to, sure, applying enough time and money could make it work. I like to think that those resources might be better spent, though.
Were they working with multiple services in a multi-repo? Seems like a cross-product explosion of repos. Did that configuration inhibit releases, or was the process cumbersome but just smooth because it was so rote?
So yeah, cumbersome, but established, and huge downside risk to messing with the status quo. It was basically Git applied on top of an existing “copy the source” release process.
The difference is that past a certain level of complexity, the org will most certainly need specialized tooling to support massive codebases to make CI/CD (build, test, deploy, etc.) times sane.
On the other hand, multi-repos may work for massive orgs, but is always going to add friction for small orgs.
Eeh... Might be quite meta-, but really doesn't feel very mono-.
The point here is to understand what are the problems that are being solved, understand if they are similar to yours, and make a decision based on wether the tradeoffs are a good fit for you.
Not necessarily disagreeing with you but I just feel the pendulum on this statement has swung to far to the other side now.
Yes it is. You said so yourself:
> The point here is to understand what are the problems that are being solved, understand if they are similar to yours, and make a decision based on wether the tradeoffs are a good fit for you.
That's you agreeing: “understand and make a decision” is pretty much the exact opposite of “adopt anything by default” — which peoples takeaway is that you shouldn't do — isn't it? So you're agreeing with the general takeaway, not arguing against it.
I think this pendulum has stopped swinging and come to a rest. And pendulums (pendula? penduli?) always stop pointing straight down, midway between the extremes: You shouldn't mindlessly reject something just because it comes from a big company, but certainly not blindly adopt it just because it does, either. But, just as you said, understand if it fits for you and then decide based on that.
And hey, on this particular issue, it seems some big companies do it one way, and others the opposite — so maybe the (non-existent) “average big company” is halfway between the extremes, too. (Whatever that means; one-and-a-half repository? Or half an infinity of repositories?)
Why? Is your framework heavily tied to client bindings? APIs I consume occasionally get new fields added to it for data I don't need. My code just ignores it. We also have a policy you cannot add a new mandatory field to API without version bump. So maybe REST API would have new field but I didn't send it and it happily didn't care.
Same idea if you use some kind of versioning and release system. It's still a bit of a pain with all the PRs and coordination involved, but at every step every branch is consistent and buildable, you just check it out and hit build.
This is obviously more difficult if you have a more loosely coupled architecture like microservices. But that's self-inflicted pain
Some benefits include:
* Separate repos evolve better than a monorepo. Particularly in early stage companies where pivots are frequent. * They keep tools just doing one thing well and prevent tying everything in the codebase together. * Dependencies are more flexible, this is particularly useful with ML libraries. * A lot of repo tooling is simpler with medium sized single purpose projects * Stronger sense of ownership boundaries
Tooling issue. Bitbucket will happily link PRs if instructed so.
> prevents a lot of mistakes from not having the right revision synced down across different repos.
Tooling issue. Depending on setup, "tooling" could be few lines in your favorite scripting language.
---
Monorepo steps around some tooling issues, but sacrifices access control, checkout speed
Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.
Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.
Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.
Neither a monorepo nor separate repos will result in people being disciplined. If you already have the discipline to do separate repositories correctly then you'll be fine with a monorepo.
So I guess it's six on one hand, half dozen in the other.
Dev's *like* to feel ownership of reasonably sized chunks of code. We like to arrange it in ways that is pleasing for us to work on later down the road. And once we've made those investments, we like to see them pay off by making quick easy changes that make users happy. Sharing a small codebase with three or four other people and finding ways to make each other's lives easier while supporting it is *fun* and it makes for better code too.
But it only stays fun if you have enough autonomy that you can really own it--you and your small team. Footguns introduced need to be pointed at your feet. Automation introduced needs to save you time. If you've got the preferences of 50 other people to consider, and you know that whatever you do you're going to piss off some 10 of them or another... the fun goes away.
This is simple:
> we own this whole repo and only this 10% of it (the public interface) needs to make external stakeholders happy, otherwise we just care about making each other happy.
...and it has no space in it for there to be any code which is not clearly owned by somebody. In a monorepo, there are plenty of places for that.
> If you aren't going to use those gatekeeping mechanisms in a mono repo I doubt that the quality of your separate repo codebases are going to be any better
Those are pretty different competencies right? Getting the monorepo in good shape requires a broad understanding of what is where and who is who (and the participation of your neighbors). Making your repo a nice place for visitors to contribute to... you can learn to to that by emulating repos found anywhere.
I just have to say this has not been my experience at all. A lot of developers want nothing more than to come in and write whatever lines of code they're told to and get paid. Ownership is often the last thing they want.
It's a feeling that's much harder to kindle in a monorepo, too much red tape and politics, nothing at all like a sandbox.
And I can't be the only one who feels this way. Otherwise why would there be a big pile of free open source software?
The meat is in the detail, I find.
This pattern has its own set of problems. Strict ownerships separation creates strong dependencies and capacity constraints on the project organization. A single team is rarely able to deliver a complete feature, because it will mean changes in a couple of services. If you go with a model where teams are allowed to build features in "foreign" services, you will still come to the situation that the owning team doesn't feel that responsible for something they haven't built / don't really understand. You can tweak it, involve the owning team more etc. but it has trade-offs / friction.
The worst anti-pattern coming from this is "we have dependency on service X / team Y, but their code reviews are usually very long and pedantic, threatening our delivery. Can we do it locally in our service instead?" which results in people putting stuff where it doesn't belong, for organisational reasons.
I just don't want there to be a no-man's-land. And if there must be an every-mans-land, let it be explicitly so, not some pile of stuff that didn't fit anywhere else.
> I just don't want there to be a no-man's-land.
A different model I've seen is that the ownership of the whole system is shared across the trusted senior members (every team has one, but you need to get an approval from an external one). One thing this avoids is the bottleneck on specific teams (only team X can approve the PR, but they're now super busy with their own features).
Haven't seen it happen in practice. For some reason a separate repo induces more of "not my backyard" feeling that a separate folder
Or do you just stop looking for a CONTRIBUTING.md when you're at work?
People see two pieces of duplicate, or even similar code in a monorepo, and they feel the urge create some sort of shared directory to de duplicate them. The problem, is this introduces more edges to your graph, which over time increases the monorepo complexity exponentially. And who is going to maintain that shared directory?
Even in a mono-repo, you should probably start off by copy and pasting code around, until you figure out how to design and find ownership for an actual well thought through public api. And maybe for some code, that never happens.
Also, like many things, people cargo cult big tech ideas without understanding why they work. In Google, the monorepo has two key features: 1) every directory has an owner by default 2) the owner can control build target visibility lists.
That means that the ownership team controls who depends on them. By default, everything is private.
Basically, every directory should either be a project, owned by a team. Or some common library code, owned by the library maintainers.
The other thing, is library maintainers need to be discerning with what APIs and code they support. Again, a common refrain I here is “when it doubt, it’s better to put it in the company’s common library to promote code reuse”. That’s the exact wrong way to think about it. Library teams should be very careful about what APIs they support because they will be on the hook maintaining them forever.
This is all to say, I can see how monorepos fail. Because people cherry-pick the ideas they like, but ignore the important things which make them work at Google: 1) default code owners 2) visibility lists that default to private 3) discerning library teams that aggressively reject code additions 4) don’t be too dry, if it can’t be exposed in a public api, just copy and paste, just like you would in a microservice.
How do you write a test for the rollbackability of a commit, if that test must be in the commit which it asserts can be rolled back? Does the test clone a second copy of the monorepo at the older version so that it has access to the "before" state? The hard parts of backwards compatibility have to do with accumulated state over time, they don't go away because you've bound some versions together.
I once worked on a codebase that had had a directory that no one was allowed to touch because it was "about to be decommissioned". When I looked into it more closely, it had been "about to be decommissioned" for 7 years and it was holding so many things up.
That's where I've been for a few months: The work of prior gatekeepers now run through the middle of what we're responsible for. It feels like we bought a house online and when we showed up the kitchen is in one country and the bathroom is in another so we have to clear customs several times a day and we have to hold a summit with the friends of the previous owner if we want to change anything--even things in the interior. The architect of the reorg would never have done this if the natural boundaries had been a bit more apparent, i.e. as a list of repos to be assigned to the new teams.
I'd prefer large scale shifts to come by replacing an old repo with a new one (or one with two, or two with one, or by open sourcing one which you no longer care to maintain). Maybe that slows down the organizational rate of change, but if the alternative is pretending to have changed in a way which isn't actually sustainable, then maybe slowing down is better.
More recent commentators have noted a corollary - for software projects with a long lifetime of code reuse, such as Microsoft Windows, the structure of the code mirrors not only the communication structure of the organization which created the most recent release, but also the communication structures of every previous team which worked on that code.
You can end up in this situation with or without a monorepo.
I have solved more bugs looking at diffs in GitHub than I have in my debugger simply by having everything in one happy scrolly view. Being able to flick my mouse wheel a few clicks and confirm that the schema does indeed align with the new DTO model props has saved me countless hours. Confirming stuff like this across multiple repos & commits can encourage a more lackadaisical approach. This also dramatically simplifies things like ORM migrations, especially if you require that all branches rebase & pass tests before merging.
I agree with most of the hypothetical caveats, but if you can overcome them even with some mild degree of suffering, I don't see why you wouldn't fight for it.
This way a commit hash contains even the exact third party code involved.
One of the teams vendored npm and go packages which are robust. Code always ran inside Docker. Being able to just clone and run simplified their flow.
There are definitely huge benefits to monorepos but I don't see how this is one.
1) Create a PR in the submodule with your changes and create a PR in the main repo with the submodule hash replaced with the one in the 1st PR 2) Merge the submodule PR (now the main repo is pointing at an out of date hash in the submodule) 3) Go back to the main repo PR, update the hash, rerun tests, merge it (assuming tests pass)
It often feels burdensome, particularly when you need to change a single line of code in the submodule. It invites race conditions when multiple people work on the submodule simultaneously (so in step 3 above your tests might not pass because somebody else merged something into the submodule repo in between 1 and 2). It also creates weirdly ambiguous situations like what if someone updates the documentation in the submodule - do you update the main repo, or do you simply allow the main repo to lag behind the submodule's master branch?
You can change the code locally on-disk for testing, then submit your changes to upstream. Once you've got an upstream ref (preferably after merge), then you can update the submodule to match.
This does slow everything down though, unless you're cleverer about CI and deployment than we were (very possible).
In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.
Do you want hard boundaries between teams — clear responsibilities with formal ceremony across boundaries, but at the expense of living with inflexibility?
Do you want fluidity in engineering, without fixed silos and a flat org structure that encourages anyone to take on anything that’s important to the business right now, but with the much bigger overhead of needing strong people leaders capable of herding the chaos?
I’m sure there are dozens of other examples of org structures and how they are reflected in code layout, repo layout, shared directories, dropboxes, chat channels, and email groups etc.
The Git team should really invest in tooling for very large repos. Our repo is around 10M files and 100M lines of code and no amount of hacks on top of Git (cache, sparse checkout etc etc) is not really solving the core problem.
Meta and Google have really solved this problem internally but there is no real open source solution that works for everyone out there.
Microsoft released Scalar (https://github.com/microsoft/scalar) although it's not a complete stack yet but it is already planning on releasing the backend components eventually.
Have you tried Sapling? It has EdenFS baked in so it'll only materialize the files you touch and operations are fast because it has a filesystem watcher for activity so it doesn't need to do a lot of work to maintain a view of what has been invalidated.
I don't think so. Last I checked there are still server components that are too tied into Facebook's infrastructure to open source.
> Scalar
As far as I understand it Scalar is no longer active because most of the features have been merged into mainline Git.
Perforce is pretty brutal, and the code review tools are awful - but its still the undisputed king of mixed text and binary assets in a huge monorepo.
I tried to bring the best of Critique to GitHub with https://codeapprove.com but you’re right there’s a lot that just doesn’t work on top of git.
https://engineering.fb.com/2023/06/27/developer-tools/meta-d...
But if you do it right, the advantage you get is that you get to pick which versions of your dependencies you use; while quite often you just want to use the latest, being able to pin is also very useful.
For C++ in particular, you need to express your dependencies in terms of source versions, and ensure all of the build artifacts you link together were built against the same source version of every transitive dependency and with the same flags. Failure to do that results in undefined behaviour, and indeed I have seen large organizations with unreliable builds as a manner of routine because of that.
The best way to achieve that is to just build the whole thing from source, with a content-addressable-store shared with the whole organization to transparently avoid building redundant things. Whether your source is in a single repo or spread over several doesn't matter so long as your tooling manages that for you and knows where to get things, but ultimately the right way to do modular is simply to synthesize the equivalent monorepo and build that. Sometimes there is the requirement that specific sources should have restricted access, which is often a reason why people avoid building from source, but that's easy to work around by building on remote agents.
Now for some reason there is no good open-source build system for C++, while Rust mostly got it right on the first try. Maybe it's because there are some C++ users still attached to the notion of manually managing ABI.
So in the end this is a yaml vs json type of argument mostly, and if you’re thinking about rioting over this there is a very good chance you could find a better hill to die on.
Fair enough, it’s just that if you’re my coworker or employee I’m wondering if you don’t have something more important to worry about =D
In my experience the article is right when it tells you there isn’t that big of a difference. We have all sorts of repositories, some of which are basically mono-repositories for their business domain. We tend to separate where it “makes sense” which for us means that it’s when what we put into repositories is completely separate from everything else. We used to have a lot of micro-repositories and it wasn’t that different to be honest. We grouped more of them together to make it easier for us to be DORA compliant in terms of the bureaucracy it adds to your documentation burden. Technically I hardly notice.
The basic way most work is to run full CI on every change. This quickly becomes a huge speedbump to deployment velocity until a solution for "only run what is affected" is found.
I don’t want to be rude, but a lot of these tools have rather accessible documentation on how to get up and running as well as extensive documentation for more complex challenges available in their official docs. Which is probably the, only, place you’ll find good ways of working with it because a lot of the search engine and LLM “solutions” will range from horrible to outdated.
It can be both slower and faster than micro-repositories in my experience, however, you’re right that it can indeed be a Cthulhu level speed bump if you do it wrong.
I'm not against only run what is affected, it is a good answer. It just has failings that you need to be aware of.
For example, in GitHub[0]:
name: ".NET - PR Unit Test"
on:
## Only execute these unit tests when a file in this directory changes.
pull_request:
branches: [main]
paths: [src/services/publishing/**.cs, src/tests/unit/**.cs]
So we set up different workflows that kick off based on the sets of files that change.[0] https://docs.github.com/en/actions/writing-workflows/workflo...
That is, the green check mark means "the changes in this commit did not break anything that was not already broken", as opposed to the more useful "the repository, as of this commit, passes all tests".
If you split something into multiple repositories then surely you wouldn't mark PRs on one of them as red just because tests are failing in a different one?
The meaning we give to "the commit is green" is not "this PR can be merged" but "this can be deployed to production", and it is used for the purpose of selecting a release candidate several times a week. It is a statement about the entire state of the project as of that commit, rather than just the changes introduced in that commit.
I can understand the frustration of creating a PR from a red commit on the main branch, and having that PR be red as well as a result. I can't say this has happened very often, though: red commits on the main branch are very rare, and new branches tend to be started right after a deployment, so it's overwhelmingly likely that the PR will be rooted at a green commit. When it does happen, the time it takes to push a fix (or a revert) to the main branch is usually much shorter than the time for a review of the PR, which means it is possible to rebase the PR on top of a green commit as part of the normal PR acceptance timeline.
If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.
The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).
Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.
TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.
You can have a mono-repo and deploy different parts of the repo as different services.
You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?
The gains provided by moving from polyrepo to monorepo are immense.
Developer access control is the only thing I can think to justify polyrepo.
I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.
Why would you bother to spend the time figuring out whether or not it needs to get deployed? Why would you spend time training other (and new) people to be able to figure that out? Why even take on the risk of someone making a mistake?
If you make your deployment fast and seamless, who cares? Deploy everything every time. It eliminates a whole category of potential mistakes and troubleshooting paths, and it exercises the deployment process (so when you need it, you know it works).
> If you make your deployment fast and seamless, who cares?
It still costs CI/CD time and depending on the target platform, there are foundational limitations on how fast you can deploy. Fixing a button in a React SPA and deploying that to S3 and CloudFront if fast. Building and deploying a Go backend container -- that didn't even change -- to ECS is at least a 4-5 minute affair.To me, in most cases, 4 or even 10 minutes is just not a big deal. It's fire-and-forget, and there should be a bunch of things in place to prevent bad deploys and ultimately let the team know if something gets messed up.
When there's an outage, multiple people get involved and it can easily hit 10+ hours of time spent. If adding a few minutes to the deploy time prevents an outage or two, that's worth it IMHO.
All you need to know is "does changing this code affect that code".
In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.
You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.
As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.
It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).
But all of that needs to be constructed too, just merging some repos into one doesn't do that.
I linked an example below. Most CI/CD, like GitHub Actions[0], can easily be configured to trigger on changes for files in a specific path.
As a very basic starting point, you only need to set up simple rules to detect which monorepo roots changed.
[0] https://docs.github.com/en/actions/writing-workflows/workflo...
That's where it immediately gets problematic; say you have a client and a server and a shared library that does your protocol or client/server code for you. If you wanted to bump the server part and check if it's backwards compatible with say, the clients that you have already deployed, your basic "match this stuff" pattern doesn't work because it now matches either nothing, because it just builds the library and not the client or the server, or it matches all of it, because you didn't make a distinction. And when you just want the new server with the new library and nothing else, you somehow have to build and emit the new library first, reference that specific version in your code, rebuild your code, and meanwhile not affect the client. That also means when someone else is busy working on the client or on the older library, you have to essentially all be on different branches that cannot integrate with each other because they would overwrite or mess up the work of the people that are busy with their own part.
You could go monorail and do something where every side effect of the thing you touches automatically also becomes your responsibility to fix, but most companies aren't at a scale to pay for that. That also applies to things like Bazel. You could use it to track dependent builds and also tag specific inter-commit dependencies so your server might build with the HEAD source library and your client would still be on an older commit. But that that point you have just done version/reference pinning, with extra steps. You also can't see that at the same time in your editor, unless you open two views in to the same source, at which point you might as well open two views, one for the library and one for the server.
If your project, organisation of people (teams) and dependencies are simple enough, and you only deploy one version at a time, of everything, then yes, doing multiple directories in a single repo or doing multiple repos is not much of a change. But you also gain practically nothing.
There are probably some more options as I don't doubt it is more of a gradient; the Elastic sources seem to do this somewhat well, but they all release everything at once in lockstep. I suppose one way to take separate versions out of that is to always export the builds and have a completely separate CD configuration and workflow elsewhere. This appears to be what they have done as it's not public.
Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about
While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.
Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.
I love the "activation energy" metaphor here. But I don't agree that "technically it is the same." At my current job, we have more than 100 minirepos and I am unable to confidently refactor the system like I normally would in a monorepo. It's not merely the psychological barrier. It's that I am unable to find all the call sites of any "published" function. Minirepos create too many "published" functions in the form of Nuget packages. Microservices create too many "published" functions in the form of API endpoints. In either case, "Find All References" no longer works; you have to grep, but most names are not unique enough.
For this reason, the kind of refactoring that keeps a codebase healthy happens at a much lower rate than all the other projects I've worked on.
While in some cases, the complete context is helpful for the job, in other cases, and I realize this may be pure paranoia but, you may not want to share the complete picture.
I think it's a natural fear but the reality is that a) most people don't leak source code, and b) access to source code isn't really that valuable. Most source code is too custom to be useful to most other people, and most competitors (outside China at least) wouldn't want to steal code anyway.
Actually I did find this answer on how Google does it and apparently they do support some ACLs for directories in their monorepo. Microsoft uses Git though so I'm not sure what they do.
https://www.quora.com/If-Google-has-1-big-monorepo-how-do-th...
This is a very important lesson.
Once you learn that The Moat is more about the customers & trust, you stop worrying so much about every last possible security vector into your text files.
Treating a repository like a SCIF will put a lot of friction on getting things done. If you simply refrain from placing production keys/certs/secrets in your source code, nothing bad will likely occur with a broad access policy.
The chances that your business has source code with any intrinsic market value is close to zero. That is how much money you should spend on defending it.
Your tooling must be different for it to work.
So using git for it will not have a positive result.
I can think of only 2 ways that the multiple-branch monorepo is worse:
1. If the monorepo is large, everyone has to deal with a fat .git folder even if they have only checked out a branch with a few files.
2. Today, everyone expects different branches in a repo to contain "different versions of the same thing", not "a bunch of different things". But this is purely convention.
The only real benefit that I can see of making a separate repo (over adding a new project directory to a "classic" monorepo) is the lower barrier to getting underway -- you can just immediately start doing whatever you want; the pain of syncing repos comes later. But this is also true when starting work under a new branch in the branch-per-project style monorepo: you can just create a branch from the initial commit, and away you go -- and if you need to atomically make changes across projects, just merge their branches first!
What are the downsides I'm not seeing?
That's not a monorepo.
What would you call it?
1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.
2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?
3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project
It’s all one kernel really.
The Kernel is a monolith, but that doesn’t make its repo a mono repo.
FreeBSD, on the other hand, is a mono-repo. It has the kernel, all the user mode tools, basically everything in a single repo.
That is very different from the Linux ecosystem as whole.
Linux is not a mono repo.
Git submodules have some places where you can surprisingly lose branches/stashed changes.
Git submodules have some places where you can surprisingly lose branches/stashed changes.
This concerns me, as git generally behaves as a leak-proof abstraction in my experience. Can you elaborate or share where I can learn more about this issue?"The other main caveat that many people run into involves switching from subdirectories to submodules. If you’ve been tracking files in your project and you want to move them out into a submodule, you must be careful or Git will get angry at you. "
Though apparently newer versions of git are better about not losing submodule branches, so my concerns were outdated.
Yeah- they idea is that all of your projects share a common repo. This has advantages and drawbacks. Google is most famous for this approach, although I think they technically have three now- one for Google, one for Android, and one for Chrome.
> They seem like a great solution to me
They don't work in a team context because they're extra steps that people don't do, basically. And did some reason a lot of people find them confusing.
If they spin out an open-source project, they either (1) continue development internally and (maybe) do periodic releases by exporting that directory from the monorepo; or (2) allow development to occur externally and periodically import changes when upgrading the version used by the monorepo.
Either way, the point is that to build any Google service, you checkout the monorepo and type whatever their equivalent of 'make' is. No external dependencies.
In my mind, a mono repo is one company, one (or a very small number of) source code repository. When I started working at Yahoo, everything was in CVS on one hostname (backed by a NetApp Filer), that was a mono repo; when you got into the weeds, there were actually a couple separate repo; Yahoo Japan had a separate repo, and DNS and prod ops had a separate repo, and a couple more, but mostly everything in one; just organized by directories so most people only checked out part of the repo, because not many people needed to look at all the code (or had disk space for either). That evolved into separate SVN repos for each group that wanted to move to SVN. I assume they moved to git at some point after I left.
Same deal when I was at Whatsapp. When I started, we had one SVN repo that everyone shared --- that was a mono repo; when we moved to git, each client had their own repo, server had a repo, and there was a common repo for docs and other things that needed sharing. Facebook had a lot of repos, but one repo was very large and requires 'monorepo' style tools. As a first step for monorepo style tools; in a large company with a large git repo; you need something to sequence commits, because otherwise everyone gets stuck on the git pull; git push loop until they manage to win the race. This wasn't an issue with a large CVS repo, because commits are file based, and while you might have conflicts within your team, you didn't need a global lock; I don't remember having issues with it in SVN either, but my memory is fuzzy and the whole company SVN repo I had was a lot smaller than the whole company CVS repo.
Maybe, I'd say a monorepo is a large repo where the majority of users/developers aren't going to need or want most of the tree.
> Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project
I don't use submodules often, and I'm not sure if some of the irritations have been fixed, but in my use I run into two things: a) git clone requires additional work to get a full checkout of all the submodules; b) git pull requires additional work to update the submodules. I'm sure there's some other issues with some git features; but I was actually fine with CVS and don't really care about git features :P
Neither one has an actual edge. Yet you can find countless articles from people talking about their experience. Take those as a hint about what kind of tooling you need, not about their comparative qualities.
In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.
IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.
that's not a monorepo!
Unless the singular "project" is stuff our company ships, the problem you have is of impedance mismatch between the projects, which is the problem that an actual monorepo solves. for swe's on individual projects who will never have the problem of having to ship a commit on all the repos at the "same" time, yeah that seems fine, and for them it is. the problem comes as a distributed systems engineer where, for whatever reason, many or all the repos need to be shipped at the ~same time. or worse - A needs to ship before B which needs ship before C but that needs to ship before A, and you have to unwind that before actually being able to ship the change.
I'm not convinced that making completely different teams work on the same repo is making things better. In the case of cascading dependencies what usually works better than a convoluted technical solution is communication.
Sure it is! It's just not the ideal use case for a monorepo which is why people say they don't like monorepos.
They are literally saying that multiple repos should be used, also for sharing the code, this is not monorepo, these are different repos.
This is google3. It was absolutely loved by the majority of devs. If you change a line of code, all dependencies are type checked and tested and also a bunch of other things. It keeps so many versioning issues out.
One of the big reasons why the JS ecosystem is so fragmented compared to Go or even Rust is the leftpad-sized packages with 10 config files that are out of date. Not to mention our friend peerDependencies, who needs no introduction.
And yet here we are with monorepos, doing the big-ball of mud approach.
I've worked on several multi-repo systems and several monorepos. I have a weak preference for monorepos for some of the reasons given, especially the spread of pull requests, but that's almost a 'code smell' in some respects.
Monorepos that I've contributed to that have worked well: mostly one language (but not always), a single top-most build command that builds everything, complete coverage by tests in all dimensions, and the repo has tooling around it (ensure code coverage on check in, and so on).
Monorepos that I've contributed to that haven't: opposites of the previous points.
Multi-repos that have worked well: well abstracted and isolated, some sort of artefact repository (nexus, jfrog, whatever) as the layer between repos, clear separation of concerns.
Multi-repos that have not worked well: again, opposites of the previous, including git submodules (please, just don't), code duplication, fragile dependencies where changing any repo meant all had to change.
Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.
Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.
That is, tightly coupled code goes into the monorepo, the rest in separate repos.
Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.
So updating them at the same time shouldn't be a huge deal, we just make the change in the library, publish the nuget package, and then bump the version number in the downstream projects that need the change.
Ideally changes to these libraries should be relatively limited.
For things that are intertwined, like an API client alongside the API provider and more project-specific libraries, we'll keep those together in the same repo.
If this is what you're thinking of, I'd be interested in hearing more about your negative experiences with such a setup.
SVN worked well for us, being a relatively small team until recently, so no real need to change until now.
The primary driver for the move to Git has been compliance. We just can't have on-prem servers with critical things like code anymore, and there's effectively just one cloud-based SVN offering that had ISO27001 etc.
So incentive to move to Git just got a lot stronger.
But it can also breed instability, as you can upgrade other people's stuff without them being aware.
There are ways around this, which involve having a local module store, and building with named versions. Very similar to a bunch of disparate repos, but without getting lost in github (github's discoverability was always far inferior to gitlab)
However it has its draw backs namely that people can hold out on older versions than you want to support.
This is why Google embraced the principle that if somebody breaks your code without breaking your tests, it's your fault for not writing better tests. (This is sometimes known as the Beyonce rule: if you liked it, you should have put a test on it.)
You need the ability to upgrade dependencies in a hands-off way even if you don't have a monorepo, though, because you need to be able to apply security updates without scheduling dev work every time. You shouldn't need a careful informed eye to tell if upgrades broke your code. You should be able to trust your tests.
However that only really works if people are willing to put effort into tests. It also surmises that people are able to accuratly and simply navigate dependencies.
The issue is that monorepos make it trivial to add dependencies, to the point where if I use a library to get access to our S3-like object storage system, it ends up pulling a massive chain of deps culminating in building caffe binaries (yes, as in the ML framework.)
I cannot possibly verify that massive dependency chain, so putting a test on some part which fails is an exercise in madness.
It requires a culture of engineering discipline that I have yet to see at scale.
Tradeoffs for mono are drivers of micro and vice versa.
Looking at the GitHub insights it becomes pretty clear there are about two key devs that commit or merge in PRs to main. I'm guessing this is also whom the code reviews happen etc. Comparing itself to Linux where the number of recurring contributors are more by orders of magnitude just reeks of inexperience. I'm being tough with my words because at face value, the monorepo argument works but it ends in code-spaghetti and heartache when things like developer succession, corporate strategy, market conditions throw wrenches in the gears.
Not for nothing I think a monorepo is perfectly fine when you can hold the dependency graph (that you have influence over) in your head.
Maybe there's a bit of /rant in this because I'm tired of hearing the same problem with solutions that are spun as novel ideas when it's really just: "Pre-optimization is the root of all evil."
You don't need to justify using a monorepo if you are small or close to single threaded in sending stuff into main. It's like a dev telling me: "I didn't add any tests to this and let me explain why..."
The explanation is the admission in my mind but maybe I'm reading into it too much.
Article is nicely written and an enjoyable read but the arguments don't have enough strength to justify. You are using a monorepo, that's okay. Until it's not, that's okay too.
It's pretty refreshing to see an experience report whose conclusion is "not much has changed", even though in practice that's the most common result for any kind of process change.
As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..
Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.
I really don't see why anything you describe would be an issue at all for a monorepo.
We release everything weekly, and some things much more frequently.
If your testing is good enough, I don't see what the issue is?
Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.
It might be acceptable for your current needs, but you will have bugs that escape testing - often intentional as you can't stop forever to fix all known bugs. In turn that means if anything changes in your current needs you will run into issues.
> We release everything weekly, and some things much more frequently.
This is a negative to users. When you think you will release again next so who cares about bugs it means your users see more bugs. Sure it is nice that you don't have to break open years old code anymore, but if the new stuff doesn't have anything the user wants is this really a good thing?
Yes, that is true. No amount of testing can prevent bugs in a complex enough project.
But this is no different in a monorepo or multirepo.
I apologise but I don't think I've understood your point.
I work on software that is safety critical, as such we require all releases to go through several weeks of manual testing - this is after a very large automated test suite. Realistically it is months of manual testing, but if programmers were perfect it could be done in weeks.