Monorepo – Our Experience
123 points
11 hours ago
| 19 comments
| ente.io
| HN
__MatrixMan__
2 hours ago
[-]
Every monorepo I've ever met (n=3) has some kind of radioactive DMZ that everybody is afraid to touch because it's not clear who owns it but it is clear from its quality that you don't want to be the last person who touched it because then maybe somebody will think that you own it. It's usually called "core" or somesuch.

Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.

Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.

Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.

reply
zaphar
2 hours ago
[-]
Since we are indulging in generalizations from our past. With separate repos you end up with 10 "cores" that are radioctive DMZ's everybody is afraid to touch. And those "disciplined" public API's will be universally hated by everyone who consumes them.

Neither a monorepo nor separate repos will result in people being disciplined. If you already have the discipline to do separate repositories correctly then you'll be fine with a monorepo.

So I guess it's six on one hand, half dozen in the other.

reply
__MatrixMan__
1 hour ago
[-]
No I think there's a clear difference. I've seen this several times: Somebody changes teams and now they're no longer responsible for a bit of code, but then they learn that it is broken in some way, and now they're sneaking in commits that--on paper--should now be handled by somebody else.

Dev's *like* to feel ownership of reasonably sized chunks of code. We like to arrange it in ways that is pleasing for us to work on later down the road. And once we've made those investments, we like to see them pay off by making quick easy changes that make users happy. Sharing a small codebase with three or four other people and finding ways to make each other's lives easier while supporting it is *fun* and it makes for better code too.

But it only stays fun if you have enough autonomy that you can really own it--you and your small team. Footguns introduced need to be pointed at your feet. Automation introduced needs to save you time. If you've got the preferences of 50 other people to consider, and you know that whatever you do you're going to piss off some 10 of them or another... the fun goes away.

This is simple:

> we own this whole repo and only this 10% of it (the public interface) needs to make external stakeholders happy, otherwise we just care about making each other happy.

...and it has no space in it for there to be any code which is not clearly owned by somebody. In a monorepo, there are plenty of places for that.

reply
gorgoiler
1 hour ago
[-]
I would advise annotating your experience anecdotes, which are surely valuable, with some information about team size, company size, corporate structure (tech team in a non tech corp, vs pure tech), age etc.

The meat is in the detail, I find.

reply
Boxxed
1 hour ago
[-]
You can have well-defined interfaces without splitting it into many repos just like how you can have well-defined interfaces without splitting it into microservices. In fact, I've seen enough garbage to know that forcing the issue via these mechanisms just makes bad software worse.
reply
CharlieDigital
9 hours ago
[-]

    > Moving to a monorepo didn't change much, and what minor changes it made have been positive.
I'm not sure that this statement in the summary jives with this statement from the next section:

    > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
    > 
    > Now, it is a single one. Easy to review, easy to merge, easy to revert.
IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.

This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.

reply
danudey
6 hours ago
[-]
I prefer microservices/microrepos _conceptually_, but we had the same experience as your quoted text - making changes to four repos, and backporting those changes to the previous two release branches, means twelve separate PRs to make a change.

Having a centralized configuration library (a shared Makefile that we can pull down into our repo and include into the local Makefile) helps, until you have to make a backwards-incompatible change to that Makefile and then post PRs to every branch of every repo that uses that Makefile.

Now we have almost the entirety of our projects back into one repository and everything is simpler; one PR per release branch, three PRs (typically) for any change that needs backporting. Vastly simpler process and much less room for error.

reply
esperent
11 minutes ago
[-]
Isn't there some middle ground between microservices and monorepos? For example, a few large clearly defined projects that talk to each other via versioned APIs?
reply
taeric
7 hours ago
[-]
My only counter argument here, is when those 4 things deploy independently. Sometimes, people will get tricked into thinking a code change is atomic because it is in one commit, when it will lead to a mixed fleet because of deployment realities. In that world, having them separate is easier to work with, as you may have to revert one of the deployments separately from the others.
reply
derefr
4 hours ago
[-]
That's just an argument for not doing "implicit GitOps", treating the tip of your monorepo's main branch as the source-of-truth on the correct deployment state of your entire system. ("Implicit GitOps" sorta-kinda works when you have a 1:1 correspondence between repos and deployable components — though not always! — but it isn't tenable for a monorepo.)

What instead, then? Explicit GitOps. Explicit, reified release specifications (think k8s resource manifests, or Erlang .relup files), one per separately-deploy-cadenced component. If you have a monorepo, then these live also as a dir in the monorepo. CD happens only when these files change.

With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do. But there can also be separate PRs for updating the code vs. "integrating and deploying changes" to a component, if-and-when that is sensible.

reply
ramchip
2 hours ago
[-]
> With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do.

How do you avoid the chicken-and-egg problem? Like if the k8s manifest contains a container tag, and the tag is created by CI when the PR is merged to main, it would seem you can’t add code and deploy that code in the same PR.

reply
derefr
2 hours ago
[-]
I haven't seen this approach in action personally, but I assume you'd set it up similarly to how Homebrew builds + pins bottles for formulae:

- PR creation (or any update to the PR branch by a human) would trigger a CI workflow to build+push the container;

- if this succeeds, the workflow would push a commit to the PR feature branch that pins the container ref;

- and the base branch would have a branch protection rule that makes the success of this CI workflow a prerequisite for PR mergeability.

In the explicit-GitOps-in-monorepo case, you'd probably want to do this by having the PR author modify the release spec file of each component they want to build a release for, replacing the old static ref with a temporary symbolic ref meaning "hey CI system, calculate this." Then the CI's added commit would rewrite those temporary symbolic refs into new static ones.

---

Although, that's all assuming that you even want to do CI builds in the first place. If you're developing containerized server software — rather than an operating system, a web browser, etc — then it shouldn't matter where you're building, and there isn't much impetus for deterministic, auditable builds, either. So why bottleneck builds by shoving them all onto some wimpy CI system—when every engineer already has a powerful workstation that can do builds often 4x faster, sitting there idle?

Here's what I call "local-first explicit GitOps":

1. you give your engineers the ability to push refs to the container registry (but you never use symbolic refs, always static manifest SHAs, so a malicious engineer can't do anything dangerous with this.

2. you add a script to your monorepo, that an engineer can run against their feature branch, that'll notice their current branch's local symbolic refs, and build+push containers locally in order to rewrite those into static refs.

3. On success, the script can use `gh` (or equivalent) to trigger PR creation.

Now every PR will come into existence already pinning valid new build artifacts by their static refs!

reply
someone654
1 hour ago
[-]
If you use helm charts or similar, the version of the image would be a variable that is updated outside the repo. ArgoCD can do this with the Application resource.

For example: build code and image, update git reference and version number in Application when build done.

You then get atomic updates of both code and configuration.

reply
taeric
3 hours ago
[-]
I mean... sure? Yes, if you add extra structure on top of your code that is there to model the deployments, then you get a bit closer to modeling your deployments. Isn't that the exact argument for why you might want multiple repositories, as well?
reply
scubbo
3 hours ago
[-]
...I can't believe I'd never thought about the fact that a "Deployment Repo" can, in fact, just be a directory within the Code Repo. Interesting thought - thanks!
reply
hinkley
2 hours ago
[-]
If the version numbers of all services built from the PR are identical, you at least have a pretty clear trail to figuring out WTF happened.

Even with a few services, we saw some pretty crunchy issues with people not understanding that service A had version 1.3.1234 of a module and Service B had version 1.3.1245 and that was enough skew to cause problems.

Distinct repos tend to have distinct builds, and sooner or later one of the ten you're building will glitch out and have to be run twice, or the trigger will fail and it won't build at all until the subsequent merge, and having numbers that are close results in a false sense of confidence.

reply
lmz
4 hours ago
[-]
Isn't a mixed fleet always the case once you have more than one server and do rolling updates?
reply
xyzzy123
2 hours ago
[-]
Sort of; at medium scale you can blue/green your whole system out of the monorepo (even if its say 20 services) in k8s and flip the ingresses to cut over during release.

Of course k8s not required, you can do it in straight IaC etc (i.e deploy a whole parallel system and switch).

It's still "mixed fleet" in terms of any shared external resources (queues, db state, etc) but you can change service interfaces etc with impunity and not worry about compatibility / versioning between services.

Throwing temporary compute at the problem can save a lot of busywork and/or thinking about integration problems.

This stops being practical if you get _very_ big but at that point you presumably have more money and engineers to throw at the problem.

reply
taeric
3 hours ago
[-]
Yes. And if you structure your code to explicitly do this, it is a lot easier to reason about.
reply
eikenberry
3 hours ago
[-]
I thought one of the whole points behind separate (non-mono)repos was to help enforce loose coupling and if you came to a point where a single feature change required PRs on 4 separate repos then that was an indicator that your project needed refactoring as it was becoming to tightly coupled. The example in the article could have been interpreted to mean that they should refactor the functionality for interacting with the ML model into it's own repo so it could encapsulate this aspect of the project. Instead they doubled down on the tighter coupling by putting them in a monorepo (which itself encourages tighter coupling).
reply
marcosdumay
2 hours ago
[-]
The issue is that you can't "enforce" loose coupling. The causality is reversed here.

Your software artifacts will have loose coupling if you divided them well enough on their creation. As soon as they are created, you can't do anything else to change it, except for joining or splitting them.

reply
eikenberry
1 hour ago
[-]
It's not about enforcement, it's about encouragement and path of least resistance. In monorepos the path of least resistance is tight coupling (unless discouraged in other ways). In 'microrepros' (?) there is added resistance to tight coupling as was presented in the article. This encourages people down the correct path but cannot enforce it (again, as was presented in the article).
reply
ericyd
4 hours ago
[-]
I felt the same, the author seemed to downplay the success while every effect listed in the article felt like a huge improvement.
reply
wongarsu
6 hours ago
[-]
It's not as much of a pain if your tooling supports git repos as dependencies. For example a typical multi-repo PR for us with rust is 1) PR against library 2) PR against application that points dependency to PR's branch, makes changes 3) PR review 4) PR 1 is approved and merged 5) PR 2 is changed to point to new master branch of commit 6) PR 2 is approved and merged

Same idea if you use some kind of versioning and release system. It's still a bit of a pain with all the PRs and coordination involved, but at every step every branch is consistent and buildable, you just check it out and hit build.

This is obviously more difficult if you have a more loosely coupled architecture like microservices. But that's self-inflicted pain

reply
audunw
7 hours ago
[-]
There’s nothing preventing you from having a single pull request in for merging branches over multiple repos. There’s nothing preventing you from having a parent repo with a lock file that gives you a single linear set of commits tracking the state of multiple repos.

That is, if you’re not tied to using just Github of course.

Big monorepos and multiple repo solutions require some tooling to deal with scaling issues.

What surprises me is the attitude that monorepos are the right solution to these challenges. For some projects it makes sense yes, but it’s clear to me that we should have a solution that allows repositories to be composed/combined in elegant ways. Multi-repository pull requests should be a first class feature of any serious source code management system. If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.

reply
CharlieDigital
7 hours ago
[-]

    > Multi-repository pull requests should be a first class feature of any serious source code management system. 
But it's currently not?

    > If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.
It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.
reply
dmazzoni
3 hours ago
[-]
> It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.

Not in my experience. I use "git blame" all the time, and routinely read through commits from many years ago in order to understand why a particular method works the way it does.

Luckily, there are many tools for merging git repos into each other while preserving history. It's not as simple as copy and paste, but it's worth the extra efford.

reply
pelletier
7 hours ago
[-]
> Multi-repository pull requests should be a first class feature of any serious source code management system.

Do you have examples of source code management systems that provide this feature and do you have experience with them? repo-centric approach of GitHub often feels limiting.

reply
jamesfinlayson
22 minutes ago
[-]
At a company I used to be at they used GitHub Enterprise and some repos definitely seemed to have linked repos or linked commits (I don't remember exactly but there was some way of linking for repos that depended on each other).
reply
jvolkman
7 hours ago
[-]
Apparently Gerrit supports this with topics: https://gerrit-review.googlesource.com/Documentation/cross-r...
reply
notwhereyouare
9 hours ago
[-]
ironically was gonna come and comment on that same second block of text.

We went from monorepo to multi-repo at work and it's been a huge set back and disappointment with the devs because it's what our contractors recommended.

I've asked for a code deploy and everything and it's failed in prod due to a missing check in

reply
CharlieDigital
8 hours ago
[-]

    > ...because it's what our contractors recommended
It's sad when this happens instead of taking input from the team on how to actually improve productivity/quality.

A startup I joined started with a multi-repo because the senior team came from a FAANG where this was common practice to have multiple services and a repo for each service.

Problem was that it was a startup with one team of 6 devs and each of the pieces was connected by REST APIs. So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

I refactored the whole thing in one weekend into a monorepo , collapsed the handful of services into one service, and we never looked back.

That refactoring and a later paper out of Google actually inspired me to write this article as a practical guide to building a "modular monolith": https://chrlschn.dev/blog/2024/01/a-practical-guide-to-modul...

reply
eddd-ddde
8 hours ago
[-]
At least google and meta are heavy into monorepos, I'm really curious what company is using a _repo per service_. That's insane.
reply
jgtrosh
7 hours ago
[-]
My team implemented (and reimplemented!) a project using one repo per module. I think the main benefit was ensuring enough separation of concern due to the burden of changing multiple parts together. I managed to reduce something like 10 repos down to 3... Work in progress.
reply
tpm
6 hours ago
[-]
> burden of changing multiple parts together

Then you are adapting your project to the properties of code repository. I don't see that as a benefit.

reply
pc86
7 hours ago
[-]
It can make sense when you have a huge team of devs and different teams responsible for everything where you may be on multiple teams, and nobody is exactly responsible for all the same set of services you are. Depending on the security/access provisioning culture of the org, "taking half a day to manually grant access to the repos so-and-so needs access to" may actually be an easier sell than "give everyone access to all our code."

If you just have 20-30 devs and everyone is pretty silo'd (e.g. frontend or backend, data or API, etc) having 75 repos for your stuff is just silly.

reply
marcosdumay
2 hours ago
[-]
Compared to all the issues with keeping a service running, pushing code to a different repo is trivial. If you think that's insane, the insanity is definitively not on the repository separation.
reply
psoundy
6 hours ago
[-]
Have you heard of OpenShift 4? Self-hosted Kubernetes by Red Hat. Every little piece of the control plane is its own 'operator' (basically a microservice) and every operator is developed in its own repo.

A github search for 'operator' in the openshift org has 178 results:

https://github.com/orgs/openshift/repositories?language=&q=o...

Not all are repos hosting one or more microservices, but most appear to be. Best of luck ensuring consistency and quality across so many repos.

reply
adra
4 hours ago
[-]
It's just as easy? When you have a monorepo with 5 million lines of code, you're only going to focus on the part of the code you care about and forget the rest. Same with 50 repos of 100,000 loc.

Enforcing standards means actually having org level mandates around acceptable development standards, and it's enforced using tools. Those tools should be just as easily run on one monorepo than 50+ distributed repositories, nay?

reply
psoundy
3 hours ago
[-]
Even in the best case of what you are describing, how are these tools configured and their configuration maintained except via PRs to the repos in question? For every such change, N PRs having to be proposed, reviewed and merged. And all this without considering the common need (in a healthy project at least) to make cross-cutting changes with similar friction around landing a change across repos.

If you wanted to, sure, applying enough time and money could make it work. I like to think that those resources might be better spent, though.

reply
bobnamob
7 hours ago
[-]
Amazon uses "repo per service" and it is semi insane, but Brazil (the big ol' internal build system) and Coral (the internal service framework) make it "workable".

As someone who worked in the dev tooling org, getting teams to keep their deps up to date was a nightmare.

reply
bluGill
7 hours ago
[-]
Monorepo and multi repo both have their own need for teams to work on dev tooling when the project gets large.
reply
dewey
8 hours ago
[-]
It's almost never a good idea to get inspired by what Google / Meta / Huge Company is doing as most of the times you don't have their problems and they have custom toolings and teams making everything work on that scale.
reply
CharlieDigital
7 hours ago
[-]
In this case, I'd say it's the opposite: monorepo as an approach works amazingly well for small teams all the ways up to huge orgs (with the right tooling to support it).

The difference is that past a certain level of complexity, the org will most certainly need specialized tooling to support massive codebases to make CI/CD (build, test, deploy, etc.) times sane.

On the other hand, multi-repos may work for massive orgs, but is always going to add friction for small orgs.

reply
dewey
7 hours ago
[-]
In this case I wasn't even referring to mono repo or not, but more about the idea of taking inspiration from very large companies for your own not-large-company problems.
reply
influx
7 hours ago
[-]
I’ve used one of the Meta monorepos (yeah there’s not just one!) and it’s super painful at that scale.
reply
aleksiy123
3 hours ago
[-]
I feel like this has been repeated so much now that peoples takeaway is that you shouldn't adopt anything from large companies as a small company by default. And thats simply not true.

The point here is to understand what are the problems that are being solved, understand if they are similar to yours, and make a decision based on wether the tradeoffs are a good fit for you.

Not necessarily disagreeing with you but I just feel the pendulum on this statement has swung to far to the other side now.

reply
wrs
6 hours ago
[-]
I worked at a Fortune 1 company that used one repo per release for a certain major software component.
reply
seadan83
4 hours ago
[-]
Did that work out well at all? Any silver lining? My first thought is: "branches" & "tags" - wow... Would branches/tags have just been easier to work with?

Were they working with multiple services in a multi-repo? Seems like a cross-product explosion of repos. Did that configuration inhibit releases, or was the process cumbersome but just smooth because it was so rote?

reply
wrs
3 hours ago
[-]
It was a venerable on-prem application done in classic three-tier architecture (VB.NET client, app server, and database). It was deployed on a regular basis to thousands of locations (one deploy per location) and was critical to a business with 11-digit revenue.

So yeah, cumbersome, but established, and huge downside risk to messing with the status quo. It was basically Git applied on top of an existing “copy the source” release process.

reply
biorach
5 hours ago
[-]
was that as insane as it sounds?
reply
stackskipton
6 hours ago
[-]
>So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

Why? Is your framework heavily tied to client bindings? APIs I consume occasionally get new fields added to it for data I don't need. My code just ignores it. We also have a policy you cannot add a new mandatory field to API without version bump. So maybe REST API would have new field but I didn't send it and it happily didn't care.

reply
jayd16
7 hours ago
[-]
If prod went down because of a missing check in, there are other problems.
reply
notwhereyouare
4 hours ago
[-]
did I say prod went down? I just said it failed in prod. it was a logging change and only half the logging went out. To me, that's a failure
reply
Attummm
3 hours ago
[-]
The issue you faced stemmed from the previous best practice of "everything in its own repository." This approach caused major issues. Such as versioning challenges and data model inconsistencies you mentioned. The situations it could lead to are comedy sketches, but it's a real pain especially when you’re part of a team struggling with these problems. And it’s almost impossible to convince a team to change direction once they’ve committed to it.

Now, though, it seems the pendulum has swung in the opposite direction, from “everything in its own repo” to “everything in one repo.” This, too, will create its own set of problems, which also can be comedic, but frustrating to experience. For instance, what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream? Coordinating that with 50 developers spread across 8 projects, all in a single repo.

Instead we could also face the problems we currently face and start out wirn a balanced approach. Start with one repository, or split frontend and backend if needed. For data pipelines that share models with the API, keep them in the same repository, creating a single source of truth for the data model. This method has often led to other developers telling me about the supposed benefits of “everything in its own repo.” Just as I pushed back then, I feel the need to push back now against the monorepo trend.

The same can be said for monoliths and microservices, where the middle ground is often overlooked in discussions about best practices.

They all reminded me of the concept of “no silver bullet”[0]. Any decision will face its own unique challenges. But silver bullet solution can create artificial challenges that are wasteful, painful, and most of all unnecessary.

[0]https://en.m.wikipedia.org/wiki/No_Silver_Bullet

reply
lolinder
2 hours ago
[-]
> what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream

The correct approach here is typically to invalidate the certificate or API key. A force push usually doesn't work.

If you're using GitHub, the dangerous commit lives on effectively forever in an awkward "not in a repository" state. Even if you're not on GitHub and your system actually garbage collects, the repo has been cloned onto enough build machines and dev machines that you're better off just treating the key or cert as compromised than trying to track down all the places where it might have been copied.

reply
Attummm
1 hour ago
[-]
The example was just to illustrate a point about a forced push.

You’re correct about keys/certs once uploaded, they should be treated as compromised, especially when the repository isn’t self-hosted. However, replacing API keys and certificates can take time, and within a large corporation, it could take months.

reply
someone654
29 minutes ago
[-]
Can you find other examples for when force pushing a shared branch is acceptable? I have trouble finding plausible examples for communicating a force push with 50 colleagues.
reply
gorgoiler
1 hour ago
[-]
Repository boundaries are affected far more by the social structure of your organisation than anything technical.

Do you want hard boundaries between teams — clear responsibilities with formal ceremony across boundaries, but at the expense of living with inflexibility?

Do you want fluidity in engineering, without fixed silos and a flat org structure that encourages anyone to take on anything that’s important to the business right now, but with the much bigger overhead of needing strong people leaders capable of herding the chaos?

I’m sure there are dozens of other examples of org structures and how they are reflected in code layout, repo layout, shared directories, dropboxes, chat channels, and email groups etc.

reply
msoad
4 hours ago
[-]
I love monorepos but I'm not sure if Git is the right tool beyond certain scale. Where I work doing a simple `git status` takes seconds due to the size of the repo. There has been various attempts to solve Git performance but so far this is nothing close to what I experienced at Google.

The Git team should really invest in tooling for very large repos. Our repo is around 10M files and 100M lines of code and no amount of hacks on top of Git (cache, sparse checkout etc etc) is not really solving the core problem.

Meta and Google have really solved this problem internally but there is no real open source solution that works for everyone out there.

reply
dijit
3 hours ago
[-]
I’m secretly hoping that google releases piper (and Mondrian); the gaming industry would go wild.

Perforce is pretty brutal, and the code review tools are awful - but its still the undisputed king of mixed text and binary assets in a huge monorepo.

reply
vlovich123
1 hour ago
[-]
Meta open-sourced their complete stack: https://github.com/facebook/sapling

Microsoft released Scalar (https://github.com/microsoft/scalar) although it's not a complete stack yet but it is already planning on releasing the backend components eventually.

Have you tried Sapling? It has EdenFS baked in so it'll only materialize the files you touch and operations are fast because it has a filesystem watcher for activity so it doesn't need to do a lot of work to maintain a view of what has been invalidated.

reply
xyzzy_plugh
9 hours ago
[-]
Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.

In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.

reply
bunderbunder
4 hours ago
[-]
I've also seen this with branching strategies. IMO the best branching strategy for open source projects is generally the worst one for commercial projects, and vice versa.
reply
b5hi
2 hours ago
[-]
this should be the top comment
reply
mgaunard
6 hours ago
[-]
Doing modular right is harder than doing monolithic right.

But if you do it right, the advantage you get is that you get to pick which versions of your dependencies you use; while quite often you just want to use the latest, being able to pin is also very useful.

reply
lukewink
6 hours ago
[-]
You can still publish packages and pull them down as (pinned) dependencies all within a monorepo.
reply
mgaunard
4 hours ago
[-]
that's a terrible and arguably broken-by-design workflow which entirely defeats the point of the monorepo, which is to have a unified build of everything together, rather than building things piecemeal in ways that could be incompatible.

For C++ in particular, you need to express your dependencies in terms of source versions, and ensure all of the build artifacts you link together were built against the same source version of every transitive dependency and with the same flags. Failure to do that results in undefined behaviour, and indeed I have seen large organizations with unreliable builds as a manner of routine because of that.

The best way to achieve that is to just build the whole thing from source, with a content-addressable-store shared with the whole organization to transparently avoid building redundant things. Whether your source is in a single repo or spread over several doesn't matter so long as your tooling manages that for you and knows where to get things, but ultimately the right way to do modular is simply to synthesize the equivalent monorepo and build that. Sometimes there is the requirement that specific sources should have restricted access, which is often a reason why people avoid building from source, but that's easy to work around by building on remote agents.

Now for some reason there is no good open-source build system for C++, while Rust mostly got it right on the first try. Maybe it's because there are some C++ users still attached to the notion of manually managing ABI.

reply
alphazard
1 hour ago
[-]
The classic micro/multi repo mistake is reaching for more repos when you really need better tooling and permissions on single repo. People have probably wasted millions of engineer-hours across the industry with multiple repos, all because GitHub doesn't have expressive path-level permissions.
reply
gregmac
8 hours ago
[-]
To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.

If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.

The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).

Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.

TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.

reply
CharlieDigital
8 hours ago
[-]
It doesn't have to be that way.

You can have a mono-repo and deploy different parts of the repo as different services.

You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

reply
Falimonda
7 hours ago
[-]
This is spot on. A monorepo can still include a granular and standardized CI configuration across code paths. Nothing about monorepo forces you to perform a singular deployment.

The gains provided by moving from polyrepo to monorepo are immense.

Developer access control is the only thing I can think to justify polyrepo.

I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.

reply
oneplane
7 hours ago
[-]
You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.

As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.

It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).

But all of that needs to be constructed too, just merging some repos into one doesn't do that.

reply
CharlieDigital
7 hours ago
[-]
This is not very complex at all.

I linked an example below. Most CI/CD, like GitHub Actions[0], can easily be configured to trigger on changes for files in a specific path.

As a very basic starting point, you only need to set up simple rules to detect which monorepo roots changed.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

reply
bryanlarsen
7 hours ago
[-]
If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.
reply
CharlieDigital
7 hours ago
[-]
It doesn't have to be that complicated.

All you need to know is "does changing this code affect that code".

In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.

You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.

reply
aswerty
8 hours ago
[-]
This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.

Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about

While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.

Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.

reply
siva7
9 hours ago
[-]
Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?
reply
devjab
8 hours ago
[-]
I don’t think CI/CD should really be a big worry as far as mono-repositories go as you can setup different pipelines and different flows with different configurations. Something you’re probably already doing if you have multiple repos.

In my experience the article is right when it tells you there isn’t that big of a difference. We have all sorts of repositories, some of which are basically mono-repositories for their business domain. We tend to separate where it “makes sense” which for us means that it’s when what we put into repositories is completely separate from everything else. We used to have a lot of micro-repositories and it wasn’t that different to be honest. We grouped more of them together to make it easier for us to be DORA compliant in terms of the bureaucracy it adds to your documentation burden. Technically I hardly notice.

reply
JamesSwift
8 hours ago
[-]
In my limited-but-not-nothing experience working with mono vs multi repo of the same projects, CI/CD definitely was one of the harder pieces to solve. Its highly dependent on your frameworks and CI provider on just how straightforward it is going to be, and most of them are "not very straightforward".

The basic way most work is to run full CI on every change. This quickly becomes a huge speedbump to deployment velocity until a solution for "only run what is affected" is found.

reply
devjab
7 hours ago
[-]
Which CI/CD pipelines have you had issues with? Because that isn’t my experience at all. With both GitHub (also Azure DevOps) and gitlab you can separate your pipelines with configurations like .gitlab-ci.yml. I guess it can be non-trivial to setup proper parallelisation when you have a lot of build stages if this isn’t something you’re familiar with. A lot of other more self-hosted tools like Gradle, RushJS and many others you can setup configurations which does X if Y and make sure only to run things which are necessary.

I don’t want to be rude, but a lot of these tools have rather accessible documentation on how to get up and running as well as extensive documentation for more complex challenges available in their official docs. Which is probably the, only, place you’ll find good ways of working with it because a lot of the search engine and LLM “solutions” will range from horrible to outdated.

It can be both slower and faster than micro-repositories in my experience, however, you’re right that it can indeed be a Cthulhu level speed bump if you do it wrong.

reply
JamesSwift
6 hours ago
[-]
I implied but didnt explicitly mention that I'm talking from the context of moving _from_ existing polyrepo _to_ monorepo. The tooling is out there to walk a more happy-path experience if you jump in on day 1 (or early in the product lifecycle). But its much harder to migrate to it and not have to redo a bunch of CI-related tooling.
reply
bluGill
7 hours ago
[-]
The problem with "only run what is affected" is it is really easy to have something that is affected but doesn't seem like it should be (that is whatever tools you have to detect is it affected say it isn't). So if you have such a system you must have regular rebuild everything jobs as well to verify you didn't break something unexpected.

I'm not against only run what is affected, it is a good answer. It just has failings that you need to be aware of.

reply
JamesSwift
6 hours ago
[-]
Yeah thats a good point. Especially for an overly-dynamic runtime like ruby/rails, theres just not usually a clean way to cordon off sections of code. On the other hand, using nx in an angular project was pretty amazing.
reply
bluGill
5 hours ago
[-]
Even in something like C++ you often have configuration, startup scripts (I'm in embedded, maybe this isn't a think elsewhere), database schemas, and other such things that the code depends on but it isn't obvious to the build system that the dependency exists.
reply
CharlieDigital
9 hours ago
[-]
Most CI/CD platforms will allow specification of targeted triggers.

For example, in GitHub[0]:

    name: ".NET - PR Unit Test"
    
    on:
      ## Only execute these unit tests when a file in this directory changes.
      pull_request:
        branches: [main]
        paths: [src/services/publishing/**.cs, src/tests/unit/**.cs]
So we set up different workflows that kick off based on the sets of files that change.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

reply
victorNicollet
8 hours ago
[-]
I'm not familiar with GitHub Actions, but we reverted our migration to Bitbucket Pipelines because of a nasty side-effect of conditional execution: if a commit triggers test suite T1 but not T2, and T1 is successful, Bitbucket displays that commit with a green "everything is fine" check mark, regardless of the status of T2 on any ancestors of that commit.

That is, the green check mark means "the changes in this commit did not break anything that was not already broken", as opposed to the more useful "the repository, as of this commit, passes all tests".

reply
plorkyeran
7 hours ago
[-]
I would find it extremely confusing and unhelpful if tests in the parent commit which weren't rerun for a PR because nothing relevant was touched marked the PR as red. Why would you even want that? That's not something which is relevant to evaluating the PR and would make you get in the habit of ignoring failures.

If you split something into multiple repositories then surely you wouldn't mark PRs on one of them as red just because tests are failing in a different one?

reply
victorNicollet
4 hours ago
[-]
I suppose our development process is a bit unusual.

The meaning we give to "the commit is green" is not "this PR can be merged" but "this can be deployed to production", and it is used for the purpose of selecting a release candidate several times a week. It is a statement about the entire state of the project as of that commit, rather than just the changes introduced in that commit.

I can understand the frustration of creating a PR from a red commit on the main branch, and having that PR be red as well as a result. I can't say this has happened very often, though: red commits on the main branch are very rare, and new branches tend to be started right after a deployment, so it's overwhelmingly likely that the PR will be rooted at a green commit. When it does happen, the time it takes to push a fix (or a revert) to the main branch is usually much shorter than the time for a review of the PR, which means it is possible to rebase the PR on top of a green commit as part of the normal PR acceptance timeline.

reply
plorkyeran
3 hours ago
[-]
Going off the PR status to determine if the end result is deployable is not reliable. A non-FF merge can have both the base commit and the PR be green but the merged result fail. You need to run your full test suite on the merged result at some point before deployment; either via a commit queue or post-merge testing.
reply
victorNicollet
3 hours ago
[-]
I agree ! We use the commit status instead of the PR status. A non-FF merge commit, being a commit, would have its own status separate from the status of its parents.
reply
ants_everywhere
8 hours ago
[-]
isn't that generally what you want? the check mark tells you the commit didn't break anything. if something was already broken it should have either blocked the commit that broke it or there's a flake somewhere that you can only locate by periodically running tests independent of any PR activity.
reply
daelon
7 hours ago
[-]
Is it a side effect if it's also the primary effect?
reply
hk1337
7 hours ago
[-]
Even AWS CodeBuild (or CodePipeline) allows you to do this now. It didn't before but it's a fairly recent update.
reply
CharlieDigital
3 hours ago
[-]
As a prior user of AWS Code*, I can appreciate that you qualified that with "Even" LMAO
reply
victorNicollet
8 hours ago
[-]
Wouldn't CI be easier with a monorepo ? Testing integration across multiple repositories (triggered by changes in any of them) seems more complex than just adding another test suite to a single repo.
reply
bluGill
7 hours ago
[-]
Pros and cons. Both can be used successfully, but there are different problems to each. If you have a large project you will have a tool teams to deal with the problems of your solution.
reply
stackskipton
6 hours ago
[-]
As DevOps/SRE type person that occasionally gets stuck with builds, Monorepos world well if company will invest in the build process. However, many companies don't do well in this area and Monorepo blast radius becomes much bigger so individual repos it is. Also, depending on the language, building private repo is easy enough to keep all common libraries in.
reply
drbojingle
1 hour ago
[-]
a lot of comments here seem to think that mono-repo has to mean something about deployment. I just don't want to have to run git fetch and 5 different repos to get everything I need and that's good enough reason for me to use one.
reply
h1fra
8 hours ago
[-]
I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.

In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.

IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.

reply
fragmede
8 hours ago
[-]
> you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project)

that's not a monorepo!

Unless the singular "project" is stuff our company ships, the problem you have is of impedance mismatch between the projects, which is the problem that an actual monorepo solves. for swe's on individual projects who will never have the problem of having to ship a commit on all the repos at the "same" time, yeah that seems fine, and for them it is. the problem comes as a distributed systems engineer where, for whatever reason, many or all the repos need to be shipped at the ~same time. or worse - A needs to ship before B which needs ship before C but that needs to ship before A, and you have to unwind that before actually being able to ship the change.

reply
h1fra
2 hours ago
[-]
my implicit point was that most people don't want monorepo; when they talk about monorepo they talk about consolidating a project together, that can span many different repos and technology.

I'm not convinced that making completely different teams work on the same repo is making things better. In the case of cascading dependencies what usually works better than a convoluted technical solution is communication.

reply
hk1337
7 hours ago
[-]
> that's not a monorepo!

Sure it is! It's just not the ideal use case for a monorepo which is why people say they don't like monorepos.

reply
vander_elst
6 hours ago
[-]
"one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library."

They are literally saying that multiple repos should be used, also for sharing the code, this is not monorepo, these are different repos.

reply
KaiserPro
4 hours ago
[-]
Monorepos have their advantages, as pointed out, one place to review, one place to merge.

But it can also breed instability, as you can upgrade other people's stuff without them being aware.

There are ways around this, which involve having a local module store, and building with named versions. Very similar to a bunch of disparate repos, but without getting lost in github (github's discoverability was always far inferior to gitlab)

However it has its draw backs namely that people can hold out on older versions than you want to support.

reply
dkarl
4 hours ago
[-]
> But it can also breed instability, as you can upgrade other people's stuff without them being aware

This is why Google embraced the principle that if somebody breaks your code without breaking your tests, it's your fault for not writing better tests. (This is sometimes known as the Beyonce rule: if you liked it, you should have put a test on it.)

You need the ability to upgrade dependencies in a hands-off way even if you don't have a monorepo, though, because you need to be able to apply security updates without scheduling dev work every time. You shouldn't need a careful informed eye to tell if upgrades broke your code. You should be able to trust your tests.

reply
paxys
3 hours ago
[-]
All the pitfalls of a monorepo can disappear with some good tooling and regular maintenance, so much so that devs may not even realize that they are using one. The actual meat of the discussion is – should you deploy the entire monorepo as one unit or as multiple (micro)services?
reply
marcosdumay
2 hours ago
[-]
That's the thing. All the pitfalls of multi-repos also disappear with good tooling and regular maintenance.

Neither one has an actual edge. Yet you can find countless articles from people talking about their experience. Take those as a hint about what kind of tooling you need, not about their comparative qualities.

reply
bobim
3 hours ago
[-]
Started to use a monorepo + worktrees to keep related but separated developments all together with different checkouts. Anybody else on the same path?
reply
magicalhippo
9 hours ago
[-]
We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.

Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.

Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.

That is, tightly coupled code goes into the monorepo, the rest in separate repos.

Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.

reply
dezgeg
6 hours ago
[-]
In my experience this turns to nightmare when (not if, when) there is need to make changes to the libraries and app at the same time. Especially with libraries it's often necessary to create a client for an API at the same time to really know that the interface is any good.
reply
magicalhippo
3 hours ago
[-]
The idea is that the libraries we put in nuget are really non-project-specific. We'll use nuget to manage library versions rather than git submodules, so hopefully they can live fine in a separate repo.

So updating them at the same time shouldn't be a huge deal, we just make the change in the library, publish the nuget package, and then bump the version number in the downstream projects that need the change.

Ideally changes to these libraries should be relatively limited.

For things that are intertwined, like an API client alongside the API provider and more project-specific libraries, we'll keep those together in the same repo.

If this is what you're thinking of, I'd be interested in hearing more about your negative experiences with such a setup.

reply
memsom
8 hours ago
[-]
monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.

As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..

Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.

reply
vander_elst
6 hours ago
[-]
Working on a monorepo where we have hundreds (possibly thousands) of projects each with a different version and release schedule. It actually works quite well, the dependencies are always in a good state, it's easy to see the ramifications of a change and to reuse common components.
reply
memsom
6 hours ago
[-]
Good for you. For us, because we have multiple projects going on, pulling the code in different ways, code that runs on embedded, code that runs in the cloud, desktop apps (real ones written in C++ and .Net, not glorified web apps), code that is customer facing, code used by third parties for integrating our products, no - it just doesn’t work. The embedded shares a core with other levels, and we support multiple embedded platforms (bare metal) and OS (Windows, Linux, Android, iOS) and also have stuff that runs in Amazon/Azure cloud platform. You might be fine, but when you hit critical mass and you have very complicated commercial concerns, it doesn’t work well.
reply
tomtheelder
4 hours ago
[-]
I mean it works for Google. Not saying that's a reason to go monorepo, but it at least suggests that it can work for a very large org with very diverse software.

I really don't see why anything you describe would be an issue at all for a monorepo.

reply
munksbeer
8 hours ago
[-]
No offense but I think you're doing monorepos wrong. We have more than 100 applications living in our monorepo. They share common core code, some common signals, common utility libs, and all of them share the same build.

We release everything weekly, and some things much more frequently.

If your testing is good enough, I don't see what the issue is?

reply
bluGill
7 hours ago
[-]
> If your testing is good enough, I don't see what the issue is?

Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.

It might be acceptable for your current needs, but you will have bugs that escape testing - often intentional as you can't stop forever to fix all known bugs. In turn that means if anything changes in your current needs you will run into issues.

> We release everything weekly, and some things much more frequently.

This is a negative to users. When you think you will release again next so who cares about bugs it means your users see more bugs. Sure it is nice that you don't have to break open years old code anymore, but if the new stuff doesn't have anything the user wants is this really a good thing?

reply
memsom
6 hours ago
[-]
No offence, but you might be a little confused by how complex your actual delivery is. That sounds simple. That sounds like it has a clear roadmap. When you don’t, and you have very agile development that pivots quickly and demands a lot of change concurrently for releases that have very different goals, it is not possible to make all your ducks sit in a row. Monorepos suck in that situation. The dependency graph is so complex it will make your head hurt. And all the streams need to converge in to the main dev branch at some point, which causes huge bottlenecks.
reply
tomtheelder
4 hours ago
[-]
The dependency graph is no different for a monorepo vs a polyrepo. It's just a question of how those dependencies get resolved.
reply
stillbourne
5 hours ago
[-]
I like to use the monorepo tools without the monorepo repo. If that makes any god damn sense. I use NX at my job and the monorepo was getting out of hand, 6 hour pipeline builds, 2 hours testing, etc. So I broke the repo into smaller pieces. This wouldn't have been possible if I wasn't already using the monorepo tools universally through the project but it ended up working well.
reply
syndicatedjelly
8 hours ago
[-]
Some thoughts:

1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.

2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

reply
datadrivenangel
7 hours ago
[-]
Monorepo is just a single repo. Yup.

Git submodules have some places where you can surprisingly lose branches/stashed changes.

reply
syndicatedjelly
7 hours ago
[-]
One of my repos has a dependency on another repo (that I also own). I initialized it as a git submodule (e.g. my_org/repo1 has a submodule of my_org/repo2).

    Git submodules have some places where you can surprisingly lose branches/stashed changes.
This concerns me, as git generally behaves as a leak-proof abstraction in my experience. Can you elaborate or share where I can learn more about this issue?
reply
klooney
7 hours ago
[-]
> Does it just mean a single version-controlled repository of code?

Yeah- they idea is that all of your projects share a common repo. This has advantages and drawbacks. Google is most famous for this approach, although I think they technically have three now- one for Google, one for Android, and one for Chrome.

> They seem like a great solution to me

They don't work in a team context because they're extra steps that people don't do, basically. And did some reason a lot of people find them confusing.

reply
nonameiguess
7 hours ago
[-]
https://github.com/google/ contains 2700+ repositories. I don't know necessarily how many of these are read-only clones from an internal monorepo versus how many are separate projects that have actually been open-sourced, but the latter is more than zero.
reply