Maybe the the problem with CI is that it's over there. As soon as it stops being something that I could set up and run quickly on my laptop over and over, the frog is already boiled.
The comparison to build systems is apt. I can and occasionally do build the database that I work on locally on my laptop without any remote caching. It takes a very long time, but not too long, and it doesn't fail with the error "people who maintain this system haven't tried this."
The CI system, forget it.
Part of the problem, maybe the whole problem, is that we could get it all working and portable and optimized for non-blessed environments, but it still will only be expected to work over there, and so the frog keeps boiling.
I bet it's not an easy problem to solve. Today's grand unified solution might be tomorrow's legacy tar pit. But that's just software.
./build.sh
If you want to ship containers somewhere, do it in your build script where you check to see if you’re running in “CI”. No fancy pants workflow yamls to vendor lock yourself into whatever CI platform you’re using today, or tomorrow. Just checkout, build w/ params, point your coverage checker at it.This is also the same for onboarding new hires. They should be able to checkout, and build, no issues or caveats, setup for local environment. This ensures they are ready to PR by end of the day.
(Fmr Director of DevOps for a Fortune 500)
Before long, you need another script that will output the train of options to your `build.sh`.
(If Fortune 500 companies can do a one-line build with zero parameters, I suspect I'd be very bored there.)
If you want to debug, docker compose or add logs and metrics to seek what you find.
And if you want your CI jobs to do things like report cute little statuses, integrate with your source forge's static analysis results viewer, or block PRs, you have to integrate with the forge at a deeper level.
There aren't good tools today for translating between the environment variables or other things that various CI platforms expose, managing secrets (if you use CI to deploy things) that are exposed in platform-specific ways, etc.
If all you're doing with CI is spitting out some binaries, sure, I guess. But if you actually ask developers what they want out of CI, it's typically more than that.
Of course, as you mention, if you want to do things like comment on PRs or report detailed status information, you have to dig deeper.
My team offers integrations of static analysis tools and inventorying tools (SBOM generation + CVE scanning) to other teams at my organization, primarily for appsec purposes. Our organization's departments have a high degree of autonomy, and tooling varies a lot. We have code hosted in GitLab, GitHub, Azure DevOps, and in distant corners my team has not yet worked with, elsewhere. Teams we've worked with run their CI in GitLab, GitHub, Azure DevOps, AWS CodeBuild, and Jenkins. Actual runners teams use may be SaaS-provided by the CI platform, or self-hosted on AWS or Azure. In addition to running in CI, we provide the same tools locally, for use on macOS as well as Linux via WSL.
The tools my team uses for these scans are common open-source tools, and we distribute them via Nix (and sometimes Docker). That saves us a lot of headaches. But every team has their own workflow preferences and UI needs, and we have to meet them on the platforms they already use. For now we manage it ourselves, and it's not too terrible. But if there were something that actually abstracted away boring but occasionally messy differences like which environment variables mean in different CI systems, that would be really valuable for us. (The same goes for even comment bots and PR management tools. GitHub and GitLab are popular, but Azure DevOps is deservedly marginal, so even general-purpose tools rarely support both Azure DevOps and other forges.)
If your concern is that one day, a few years from now, you'll need to migrate from one forge to another, maybe you can say "my bash script handles all the real build logic" and get away with writing off all the things it doesn't cover. Maybe you spend a few days or even a few weeks rewriting some platform-specific logic when that time comes and forget about it. But when you're actually contending with many such systems at once, you end up wishing for sane abstractions or crafting them yourself.
over multiple machines? I'm not sure that a sh script can do that with github
But it's kind of cheating, because the Nix daemon actually handles per-machine scheduling and cross-machine orchestration for you.
Just set up some self-hosted runners with Nix and an appropriately configured remote builders configuration to get started.
If you really want to, you can graduate after that to a Kubernetes cluster where Nix is available on the nodes. Pass the Nix daemon socket through to your rootless containers, and you'll get caching in the Nix store for free even with your ephemeral containers. But you probably don't need all that anyway. Just buy or rent a big build server. Nix will use as many cores as you have by default. It will be a long time before you can't easily buy or rent a build server big enough.
My experience, when it gets time to actually build the thing. A one-liner (with args if you need them) is the best approach. If you really REALLY need to, you can have more than one script for doing it - depending on what path down the pipeline you take. Maybe it's
1) ./build.sh -config Release
2) ./deploy.sh -docker -registry=<$REGISTRY> --kick
Just try not to go too crazy. The larger the org, the larger this wrangling task can be. Look at Google and gclient/gn. Not saying it's bad, just saying it's complicated for a reason. You don't need that (you'll know if you do).The point I made is I hate when I see 42 lines in a build workflow yaml that isn't syntax highlighted because it's been |'d in there. I think the yaml's of your pipelines, etc, should be configuration for the pipeline and the actual execution should be outsourced to a script you provide.
There are too many scripts like that which start, ask for sudo and then it's off to implementing someones "great idea" about your systems network interfaces.
If there’s something you require that requires sudo, it’s a pre-build environment setup on your machine. On the host. Or wherever. It’s not part of the build. If you need credentials, get them from secrets or environment variables.
The single job pipeline doesn’t tell you what failed. It doesn’t parallelize unit and integration test suites while dealing with the combinatorial matrix of build type, target device, etc.
At some point, a few CI runners become more powerful than a developer’s workstation. Parallelization can really matter for reducing CI times.
I’d argue the root of the problem is that we are stuck on using “make” and scripts for local build automation.
We need something descriptive enough to describe a meaningful CI pipeline but also allow local execution.
Sure, one can develop a bespoke solution, but reinventing the wheel each time gets tiring and eventually becomes a sizable time sink.
In principle, we should be able to execute pieces of .gitlab-ci.yml locally, but even that becomes non trivial with all the nonstandard YAML behaviors done in gitlab, not to mention the varied executor types.
Instead we have a CI workflow and a local workflow and hope the two are manually kept in sync.
In some sense, the current CI-only automation tools shouldn’t even need to exist (gitlab, Jenkins, etc) — why didn’t we just use a cron job running “build.sh” ?
I argue these tools should mainly only have to focus on the “reporting/artifacts” with the pipeline execution parts handled elsewhere (or also locally for a developer).
Shame on you GitLab!
That's kinda what hermetic means, though, isn't it? Whether that's painful or not, that's pretty much exactly what GGP was asking for!
> Once one thing was outside of nix, everything exploded and writing a workaround was miserable because the nix configuration did not make it easy.
Nix doesn't make it easy to have Nix builds depend on non-Nix things (this is required for hermeticity), but the other way around is usually less troublesome.
Still, I know what you mean. What languages were you working in?
And like, I'm getting to the point of being old enough that I've "seen this before"; I feel like I've seen other projects that went "this really hard problem will be solved once we just re-implement everything inside our new system" and it rarely works; you really need a degree of pragmatism to interact with the real world. Systemd and Kubernetes are examples of things that do a lot of re-implementation but are mostly better than the previous.
I feel the same way about systemd, and I'll take your word for it with respect to Kubernetes. :)
> "this really hard problem will be solved once we just re-implement everything inside our new system" [...] rarely works
Yes. 100%. And this is definitely characteristic of Nix's ambition in some ways as well as some of the most painful experiences users have with it.
> you really need a degree of pragmatism to interact with the real world
Nix is in fact founded on a huge pragmatic compromise: instead of beginning with a new operating system, or a new executable format with a new linker, or even a new basic build system (a la autotools or make)! Instead of doing any of those things, Nix's design manages to bring insights and features from programming language design (various functional programming principles and, crucially, memoization and garbage collection) to build systems and package management tools, on top of existing (even aging) operating systems and toolchains.
I would also contend that the Nixpkgs codebase is a treasure, encoding how to build, run, and manage an astonishing number of apps (over 120,000 packages now) and services (I'd guess at least 1,000; there are some 20,000 configuration options built into NixOS). I think this does to some extent demonstrate the viability of getting a wide variety of software to play nice with Nix's commitments.
Finally, and it seems you might not be aware of this, but there are ways within Nix to relax the normal constraints! And of course you can also use Nix in various ways without letting Nix run the show.[0] (I'm happy to chat about this. My team, for instance, uses Nix to power Python development environments for AWS Lambdas without putting Nix in charge of the entire build process.)
However:
- fully leveraging Nix's benefits requires fitting within certain constraints
- the Nix community, culturally, does not show much interest in relaxing those constraints even when possible[1], but there is more and more work going on in this area in recent years[2][3] and some high-profile examples/guides of successful gradual adoption[4]
- the Node ecosystem's habit of expecting arbitrary network access at build time goes against one of the main constraints that Nix commits to by default, and *this indeed often makes packaging Node projects "properly" with Nix very painful*
- Python packaging is a mess and Nix does help IME, but getting there can be painful
Maybe if you decide to play with Nix again, or you encounter it on a future personal or professional project, you can remember this and look for ways to embrace the "heretical" approach. It's more viable and more popular than ever :)--
0: https://zimbatm.com/notes/nix-packaging-the-heretic-way ; see also the community discussion of the post here: https://discourse.nixos.org/t/nix-packaging-the-heretic-way/...
1: See Graham Christensen's 2022 NixCon talk about this here. One such constraint he discusses relaxing, build-time sandboxing, is especially useful for troublesome cases some Node projects: https://av.tib.eu/media/61011
2: See also Tom Bereknyei's NixCon talk from the same year; the last segment of it is representative of increasing interest among technical leaders in the Nix community on better enabling and guiding gradual adoption: https://youtu.be/2iugHjtWqIY?t=830
3: Towards enabling gradual adoption for the most all-or-nothing part of the Nix ecosystem, NixOS, a talk by Pierre Penninckx from 2024: https://youtu.be/CP0hR6w1csc
4: One good example of this is Mitchell Hashimoto's blog posts on using Nix with Dockerfiles, as opposed to the purist's approach of packaging your whole environment via Nix and then streaming the Nix packages to a Docker image using a Nix library like `dockerTools` from Nixpkgs: https://mitchellh.com/writing/nix-with-dockerfiles
Everything becomes a container so why not use the container engine for it. If you know how layers work…
A good makefile is really nice to use. Not nice to read or trace unfortunately though.
build.bash <debug|release>
and that's it (and that can even trigger a container build).I've spent far too much time debugging CI builds that work differently to a local build, and it's always because of extra nonsense added to the CI server somehow. I've yet to find a build in my industry that doesn't yield to this 'pattern'.
Your environment setup should work equally on a local machine or a CI/CD server, or your devops teams has identically set it up on bare metal using Ansible or something.
It’s possible to do all of this with a pure shell script, but then you’re probably reimplementing some or all of the list above.
I was making a general comment that your build should be a single 'command'. Personally, I don't care what the command is, only that it should be a) one command, and b) 100% runnable on a dev box or a server. If you use make, you'll soon end up writing... shell scripts, so just use a shell script.
In an ideal world your topmost command would be a build tool:
./gradlew build
bazel build //...
make debug
cmake --workflow --preset
Unfortunately, the second you do that ^^^, someone edits your CI/CD to add a step before the build starts. It's what people do :(All the cruft that ends up *in CI config*, should be under version control, and inside your single command, so you can debug locally.
#!/bin/sh
step-I-added-to-shell-rather-than-CI-yaml
make debug # or cmake, bazel
This is better so you can run the whole thing locally, and on different CI providersIn general, a CI is not a DAG, and not completely parallel -- but it often contains DAGs
Many years ago, I wrote 3 makefiles from scratch as an exploration of this (and I still use them). I described the issues here: https://lobste.rs/s/yd7mzj/developing_our_position_on_ai#c_s...
---
The better style is in a sibling reply -- invoke Make from shell, WHEN you have a problem that fits Make.
That is, the "main" should be shell, not Make. (And it's easy to write a dispatcher to different shell functions, with "$@", sometimes called a "task file" )
In general, a project's CI does not fit entirely into Make. For example, the CI for https://oils.pub/ is 4K lines of shell, and minimal YAML (portable to Github Actions and sourcehut).
https://oils.pub/release/latest/pub/metrics.wwz/line-counts/...
It invokes Make in a couple places, but I plan to get rid of all the Make in favor of Python/Ninja.
I hate committing makefiles directly if it can be helped.
You can still call make in the script after generating the makefile, and even pass the make target as an argument to the bash script if you want. That being said, if you’re passing more than 2-3 arguments to the build.sh you’re probably doing it wrong.
My contention is that a build script should ideally be:
sha-bang
clone && cd $cloned_folder
${generate_makefile_with_tool}
make $1
Anything much longer than that can (and usually will) quickly spiral out of control.
Make is great. Unless you're code-golfing, your makefile will be longer than a few lines and a bunch of well-intentioned-gremlins will pop in and bugger the whole thing up. Just seen it too many times.
Edit: in the jenkins case, in a jenkins build shell the clone happens outside build.sh:
(in jenkins shell):
clone && cd clone ./build.sh $(0-1 args)
(inside build.sh): $(generate_makefile_with_tool) make $1
The problem is that shell commands are very painful to specify in a Makefile with weird syntactical rules. Esp when you need them to run in one shell - a lot of horror quoting needed.
Overall I'm not a fan of wrapping things; if there are flags or options on the top-level build tool, I'd rather my devs explore those and get used to what they are and can do, rather than being reliant on a project-specific script or make target to just magically do the thing.
Anyway, other than calling the build tool, CI config can have other steps in it, but it should be mostly consumed with CI-specific add-ons, like auth (OIDC handshake), capturing logs, uploading artifacts, sending a slack notification, whatever it is.
Our wrapping is 'minimal', in that you can still run
bazel build //...
or cmake ...
and get the same build artefacts as running: build.bash release
My current company is fanatical about read-only for just about every system we have (a bit like Nix, I suppose), and that includes CI/CD. Once the build is defined to run debug or release, rights are removed so the only thing you can edit are the build scripts you have under your control in your repo. This works extremely well for us.I expect this is largely a concession to the reality that most autotools projects still expect an in-source build, not to mention Python wanting to spray pyc files and build/dist directories all over the place.
The reason it didn't catch on? Everyone else was running local builds in a proprietary IDE, so to them the local build was never the same anyway.
That keeps the cli interface easy, expectable and guessable.
Build the software inside of containers (or VMs, I guess): a fresh environment for every build, any caches or previous build artefacts explicitly mounted.
Then, have something like this, so those builds can also be done locally: https://docs.drone.io/quickstart/cli/
Then you can stack as many turtles as you need - such as having build scripts that get executed as a part of your container build, having Maven or whatever else you need inside of there.
It can be surprisingly sane: your CI server doing the equivalent of "docker build -t my_image ..." and then doing something with it, whereas during build time there's just a build.sh script inside.
[0]: https://imgur.com/gallery/eve-online-learning-curve-jj16ThL
As long as communications have bounded speed (speed of light or whatever else) there will be event horizons.
The point of a database is to track changes and therefore time centrally. Not because we want to, but because everything else has failed miserably. Even conflicting CRDT change merges and git merges can get really hairy really quickly.
People reinvent databases about every 10 years. Hardware gets faster. Just enjoy the show.
What I got from Hickey's talk is that he wanted to design a system that resisted the urge to encode everything in a stored procedure and run it on the database server.
Oh the DSL doesn't support what I need it to do.
Can I just have some templating or a little bit of places to put in custom scripts?
Congratulations! You now have a turing complete system. And yes, per the article that means you can cryptocurrency mine.
Ansible terraform Maven Gradle.
Unfortunate fact is that these IT domains (builds and CI) are at a junction of two famous very slippery slopes.
1) configuration
2) workflows
These two slippery slopes are famous for their demos of how clean and simple they are and how easy it is to do. Anything you need it to do.
In the demo.
And sure it might stay like that for a little bit.
But inevitably.... Script soup
Then the CI just becomes a bit of yaml that runs my script.
In my experience these are the bits that fail all the time, and are the most important parts of CI once you go beyond it taking 20/30 seconds to build.
A clean build in an ephemeral VM of my project would take about 6 hours on a 16 core machine with 64GB RAM.
It sounds like you've got hundreds of millions of lines of code! (Maybe a billion!?) How do you manage that?
Our cached builds on CI are 20 minutes from submit to running on steam which is ok. We also build with MSVC so none of the normal ccache stuff works for us, which is super frustrating
I am fortunate in that the only things I want to reuse is package manager caches.
The complicated part comes when you have job A that builds and Job B that deploys - they run on two different machine specs so you’re not paying for a 16 core machine to wait for helm apply to wait for 5 minutes - they need somewhere secure to shuffle that artifact around. Their access to that service is likely different to your local access to that service, so you run your build locally and it’s fine but then the build machine doesn’t have write access to the new path you’ve just tested and it fails.
90% of the time these are where I see CI failures
At my place, we have ~400 wall hours of testing, and my run begins by figuring out what tests should be running and what can be skipped. This depends on many factors, and the calculation of the plan already involves talking to many external systems. Once we have figured out a plan for the tests, we can understand the plan for the build. Only then we can build, and test afterwards. I haven't been able to express all of that in "a bit of yaml" so far.
Need AWS, Azure or GCP deployment? Ever thought about putting it on bare metal yourself? If not, why not? Because it's not best practice? Nonsense. The answer with these things is: it depends, and if your app has not that many users, you can get away with it, especially if it's a B2B or internal app.
It's also too US centric. The idea of scalability applies less to most other countries.
it gets worse when you need more servers because your ocr process of course needs cpu x so on a beefiy machine you can handle maybe 50 high page documents. but how do you talk to other machines, etc.
also humans costs way more money than cloud stuff. I the cloud stuff can be managed in like 1 day per month you dont need a real person, if you have real hardware that day is not enough and you soon need a dedicated person, keeping everything up-to-date, etc.
In my experience, I have observed the opposite: companies with on-site infrastructure have been able to manage it in the spare time of a relatively small team (especially since hardware is pretty powerful and reliable nowadays), while those with cloud infrastruture have a large team focused on just maintaining the system, because cloud pushes you into far more complex setups.
I know SaaS businesses that don't as they operate in a single country, within a single timezone and the availability needs to be during business days and business hours.
> easy rollbacks
Yea, I haven't seen exceptions at all on this. So yea.
> server fault tolerance
That really depends. Many B2B or internal apps are fine with a few hours, or even a day, of downtime.
> service isolation
Many companies just have one app and if it's a monolith, then perhaps not.
> Hand-rolling even one of those things
Wow, I see what you're trying to say and I agree. But it really comes across as "if you don't use something like Kubernetes you need to handroll these things yourself." And that's definitely not true. But yea, I don't think that's what you meant to say.
Again, it depends
For context, I work in exactly that kind of "everyone in one time zone" situation and none of our customers would be losing thousands by the minute if something went down for a few hours or even a day. But I still like all the benefits of a "modern devops" approach because they don't really cost much at all and it means if I screw something up, I don't have to spend too much time unscrewing it. It took a bit more time to set up compared to a basic debian server, but then again, I was only learning it at the time and I've seen friends spin up fully production-grade Kubernetes clusters in minutes. The compute costs are also negligible in the grand scheme of things.
Features aren't pokemon you don't have to catch them all.
Back when stackoverflow was cool and they talked about their infrastructure, they were running the whole site at 5 9s on 10-20 boxes. For a setup like that k8s would have A) required more hardware B) a complete rewrite of their system to k8sify it C) delivered no additional value.
k8s does good things if you have multiple datacenters worth of hardware to manage, for everyone else it adds overhead for features you don't really need.
B) Why on earth would you need to do that? K8s is, at its core, just a thing that runs containers. Take your existing app, stick it in a container and write a little yaml explaining which other containers it connects to. It can do many other things, but just...don't use them?
C) The value is in not having to develop orchestration in house. They already had it so yea, I wouldn't say "throw it out and go to k8s", but if you're starting from scratch and considering between "write and maintain a bunch of bespoke deployment scripts" and "just spin up Talos, write a few yaml files and call it a day" I think the latter is quite compelling.
This is a bad road to go down. Management will understand the implication that it's okay to reduce reliability requirements because "we'll just do the dangerous things on the weekends!"
After some time, developers are scheduled every other weekend and when something breaks during daytime, it's not going to be a smooth process to get it up again, because the process has always been exercised with 48 hours to spare.
Then at some point it's "Can we deploy the new version this weekend?" "No, our $important_customer have their yearly reporting next week, and then we have that important sales demo, so we'll hold off another month on the deployment." You get further and further away from continuous integration.
What costs are you talking about? Packaging your app in a container is already quite common so if you already do that all you need to do is replace your existing yaml with a slightly different yaml.
If you don't do that already, it's not really that difficult. Just copy-paste your your install script or rewrite your Ansible playbooks into a Dockerfile. Enjoy the free security boost as well.
What are the other costs? Maintaining something like Talos is actually less work than a normal Linux distro. You already hopefully have a git repo and CI for testing and QA, so adding a "build and push a container" step is a simple one-time change. What am I missing here?
All the stuff Erlang does.
Static linking and chroot.
The problems and the concepts and solutions have been around for a long time.
Piles and piles of untold complexity, missing injectivity on data in the name of (leaky) abstractions and cargo-culting have been with us on the human side if things for even longer.
And as always: technical and social problems may not always benefit from the same solutions.
Congrats, you've just rewritten half of kubernetes in bash. This isn't reducing complexity, it's NIH syndrome. You've recreated it, but in a way that nobody else can understand or maintain.
Cannot edit anymore so amending here:
Static liking and chroot (not as The One True Solution (TM)) but as basically Docker without Linux network namespaces.
Linux/Docker actually wound up improving things here. And they got to spend all the money on convincing the people that like advertisements.
And static linking mainly only becomes relevant (and then irrelevant again) in C because if boundaries between compilation units. SQLite throws all of this out. They call it an amalgamation (which also sounds better than a "unity build").
The tools are there. They are just overused. Look at enterprise Hello World in Java for a good laugh.
————
If your data lives in a database on another end if a unix or TCP socket, then I still don't see "NIH". The new binary self-tests and the old binary waits for a shutdown command record and drains its connections.
Kernels and databases clock in at over 5M lines of code. NIH seems like missing the point there.
And most services neither need nor have nine nines of uptime. That is usually too expensive. And always bespoke. Must be tailored to the available hardware.
Code is less portable than people believe.
Ten #ifdef directives and you are often dead on arrival.
Seriously, take a look at their pinned repo: https://github.com/actions/starter-workflows
> Thank you for your interest in this GitHub repo, however, right now we are not taking contributions.
> We continue to focus our resources on strategic areas that help our customers be successful while making developers' lives easier. While GitHub Actions remains a key part of this vision, we are allocating resources towards other areas of Actions and are not taking contributions to this repository at this time.
"Github Actions might be over, so not worth engaging" was not on my bingo card.
We used to have to be able to communicate with other humans to build something. It seems to me that's what they're trying to take out of the loop by doing the things that humans do: talk to other humans and give them what they're asking for.
I too am not a fan of the dystopias we're ending up in.
Orrrrr… just keep that YAML as the sole configuration input in the first place. Use AI to write it if you wish, but then leave it alone.
I would hope that this comes with major changes to GHA’s permissions system, but I’m not holding my breath for that.
So basically either the whole CI pipeline is just a single command invoking my build system or the CI pipeline can be ran locally. Any other arrangement is self-inflicted suffering.
There are 12 yml keywords in total that cover everything.
Other cool things are the ability to ssh in a build if it failed(for debugging), and to run a one-time build with a custom yml without committing it(for testing).
I believe it can checkout any repository, not just one in sourcehut that triggers a build, and that has also a GraphQL API
This caused me to default back to Jenkins several times already, now I'm in a position to never wander off to another yaml-based tool.
Both of them provide VMs where you can run anything, and bash is of course there on every image.
We do that for https://oils.pub/
sourcehut yaml: https://github.com/oils-for-unix/oils/tree/master/.builds
github yaml: https://github.com/oils-for-unix/oils/tree/master/.github/wo...
They both call the same shell. The differences are:
* We use Github's API to merge on green; right now we don't have the same for sourcehut (since Github is the primary repo)
* Github Actions provides way more resources. They are kind of "locking projects in" by being free.
This post on NixOS gives a hint of htat
https://blog.erethon.com/blog/2025/07/31/how-nixos-is-built/
The monthly cost for all the actions in July of 2025 came out to a bit over 14500 USD which GitHub covers in its entirety.
So I think many projects are gradually sucked in to Github because it is indeed quite generous (including us, which annoys me -- we run more tasks on Github than sourcehut, even though in theory we could run all on sourcehut)
---
BUT I think it is a good idea to gradually consolidate your logic into shell, so you can move off Github in the future. Open source projects tend to last longer than cloud services.
This already happened to us -- we started using Travis CI in 2018 or so, and by 2021, it was acquired and the free tier was removed
Previously I've argued that CI/CD systems need two things, the ability to run bash and secrets management. Today I'd add: The ability to spin up an isolated environment for running the bash script.
Putting too much responsibility in the ci environment makes life as a developer (or anyone responsible for maintaining the ci process) more difficult. It’s far more superior to have a consistent use of the build system that can be executed the same way on your local machine as it is in your ci environment. I suppose this is the mess you find yourself in when you have other teams building your pipelines for you?
Im still rocking my good old jenkins machine, which to be fair took me a long time to set up, but has been rock solid ever since and will never cost me much and will never be shut down.
But i can definitely see the appeal of github actions, etc…
At $dayjob they recently set up git runners. The effort I’m currently working on has the OS dictated to us, long story don’t ask. The OS is centos 7.
The runners do not support this. There is an effort to move to Ubuntu 22.04. The runners also don’t support this.
I’m setting up a Jenkins instance.
god help you, and don’t even bother with the local emulators / mocks.
Are there any Jenkins Gurus out there who can give some tips?
I've never been a fan of GitHub Actions (too locked-in/proprietary for my taste), so no idea if it lives up to expectations.
This times a million.
Use a real programming language with a debugger. YAML is awful and Starlark isn’t much better.
I was with you until you said "Starlark". Starlark is a million times better than YAML in my experience; why do you think it isn't?
When they went commercial, GitHub Actions became the obvious choice, but it's just married to so much weirdness and unpredictability.
Whole thing with Drone opened my eyes at least, I'll never sign a CLA again
Don't get me wrong, it's a fantastic primitive.
But eventually you need to conditionally run some tests (to save compute).
For some benchmarks you might have limited hardware, so you need to coalesce jobs, and only run every 5 or 10 commits. You might want to keep the hardware hot, but also the queue small. So ideally you want to coalesce dynamically.
You also want result reporting, comparisons to previous results. Oh, and since you're coalescing some jobs and doing others conditionally you'll need ways to manually trigger skipped jobs later, maybe bisect too.
It's when you need to economize your compute that CI can get really complex. Especially, if you have fragile benchmark that or flaky tests.
Yes, in theory you can enforce a culture that removes flaky tests, but doing so often requires tooling support -- statistics, etc.
We ended up wrapping everything in a Docker container and back to just running a bash script. Drone had to be used because the architects that be, had decided that Drone was the answer to some question that no one apparently asked.
That's the ideal. It's not doing anything you didn't explicitly tell it to.
> We ended up wrapping everything in a Docker container and back to just running a bash script.
That's literally what drone is for
I recently spent a day trying to get a GH Actions build going but got frustrated and just wrote my own console app to do it. Polling git, tracking a commit hash and running dotnet build is not rocket science. Putting this agent on the actual deployment target skips about 3 boss fights.
Also, Windows has a consistent user-mode API surface (unlike Linux), so a .NET app that runs on a desktop will run on server almost always.
The same cannot be said for someone developing on a "UNIX-like" system such a MacOS and then trying to run it on Ubuntu... or RedHat. Alpine? Shit...
Where I differ a bit is on the "two DAGs" criticism. In practice the granularity isn’t the same: the build system encodes how to compile and test, while the CI level is more about orchestration, cloning the repo, invoking the build system, publishing artifacts. That separation is useful, though we do lose the benefits of a single unified DAG for efficiency and troubleshooting.
The bigger pain points I hear from developers are less about abstractions and more about day-to-day experience: slow performance, flakiness, lack of visibility, and painful troubleshooting. For example, GitHub Actions doesn’t let you test or debug pipelines locally, you have to push every change to the remote. The hosted runners are also underpowered, and while self-hosting sounds attractive, it quickly becomes a time sink to manage reliably at scale.
This frustration is what led me to start working on Shipfox.io. Not a new CI platform, but an attempt to fix these issues on top of GitHub Actions. We’re focused on faster runners and better visibility, aggregating CI logs, test logs, CPU and memory profiles to make failures and performance problems easier to debug.
edit: all the docs are about "agents"; I don't want AI agents, is this for me at all?
For example, there is a quick start, so I skip that and click on "core concepts". That just redirects to quick start. There's no obvious reference or background theory.
If I was going to trust something like this I want to know the underlying theory and what guarantees it is trying to make. For example, what is included in a cache key, so that I know which changes will cause a new invocation and which ones will not.
Thanks so much for taking a look and sharing your feedback! We've heard this feedback in the past and are working on a big docs change that should make this whole experience a lot better for folks that are new to dagger.
https://devel.docs.dagger.io/getting-started/concepts
This should land in the coming weeks.
And I want it all as a SaaS!
This is useful because you get a single source of truth for "does that commit break the build" and eliminate implicit dependencies that might make builds work on one machine but not another.
But specifying dependencies between your build targets and/or sourcefiles, is turning that runner into a bad, incomplete reimplementation of make, which is what this post is complaining about AFAICT.
To make things simple: make is a build system, running make in a cron task is CI.
There is nothing special about tests, it is just a step in the build process that you may or may not have.
No. "Continuous Integration" is the practice of frequently merging changes to main. In this sense, "integration" means to take my changes and combine them with other recent changes.
A build and test system like those described in this article is a way to make CI safe and fast. It's not CI itself, it's just the enabling automation: the pre-merge checks and the post-merge artefact creation.
CI being a framework, is easy to be locked into -- preventing local-first dev.
I find justfiles can help unify commands, making it easier to prevent accruement of logic in CI.
> So here's a thought experiment: if I define a build system in Bazel and then define a server-side Git push hook so the remote server triggers Bazel to build, run tests, and post the results somewhere, is that a CI system? I think it is! A crude one. But I think that qualifies as a CI system.
Yes the composition of hooks, build, and result posting can be thought as a CI system. But then the author goes on to say
> Because build systems are more generic than CI systems (I think a sufficiently advanced build system can do a superset of the things that a sufficiently complex CI system can do)
Which is ignoring the thing that makes CI useful, the continuous part of continuous integration. Build systems are explicitly invoked to do something, CI systems continuosly observe events and trigger actions.
In the conclusion section author mentions this for their idealized system:
> Throw a polished web UI for platform interaction, result reporting, etc on top.
I believe that platform integrations, result management, etc should be pretty central for CI system, and not a side-note that is just thrown on top.
Why do you even have a "build system"? Why not just a shell script that runs 'cc -o foo foo.c' ? Because there are more complicated things you want to do, and it would be annoying to write out a long shell script to do them all. So you have a program ('build system') that does the complicated things for you. That program then needs a config file so you can tell the program what to do.
But you want to run that 'build system' remotely when someone does a git-push. That requires a daemon on a hosted server, authentication/authorization, a git server that triggers the job when it receives a push, it needs to store secrets and pass them to the job, it needs to run it all in a container for reliability, it needs to run the job multiple times at once for parallelism, it needs to cache to speed up the jobs, it needs to store artifacts and let you browse the results or be notified of them. So you take all that complexity, put it in its own little system ('CI system'). And you make a config file so you can tell the 'CI system' how to do all that.
Could you shove both separate sets of complex features into one tool? Sure you can. But it would make it harder to develop and maintain them, change them, replace them. Much simpler to use individual smaller components to compose a larger system, than to try to build one big, complex, perfect, all-in-one-system.
Don't believe me? There's a reason most living creatures aren't 6-foot-tall amoebas. We're systems-on-systems-on-systems-on-systems (many of which have similar features) and it works pretty well. Our biggest problem is often that our individual parts aren't composeable/replaceable enough.
I wish the author gave more concrete examples about what kinds of workflows they want to dynamically construct and remotely execute (and why a separate step of registering the workflow up front with the service before running it is such a dealbreaker), and what a sufficiently generic and unopinionated definition schema for workflows and tasks would look like as opposed to what a service like GitHub Actions defines.
Generally, registering a workflow with the service (putting it in your repo, in the case of GHA) makes sense because you're running the same workflows over and over. In terms of task definitions, GHA is workflows -> jobs -> tasks -> actions, where jobs are tied to runners and can have dependencies defined between them. If you want to use those primitives to do something generic like run some scripts, you can do that in a very bare-bones way. When I look at the Taskcluster task definition they linked, I see pretty much the same thing.
Something that comes up for me a lot at my work: running custom slices of the test suite. The full test suite probably takes CPU-days to run, and if I'm only interested in the results of something that takes 5 CPU-minutes to run, then I shouldn't have to run all the tests.
We’ve paired this with buildkite which allows uploading pipeline steps at any point during the run, so our CI pipeline is one step, that generates the rest of the pipeline and uploads that.
I’m working on open sourcing this meta-build tool as I think it is niche that has no current implementation and it is not our core business.
It can build a dependency graph across many systems (terraform, go, python, nix) by parsing from those systems what they depend on. Smashes them all together, so you can have a terraform module that depends on a go binary that embeds some python; and if you change any of it then each parts can have tasks that are run (go test/build, tf plan, pytest, and etc)
Many people have the idea they can make things simpler. Which is really easy because the basic problems are not that hard. Them someone needs "just one more small feature" which seems easy enough and it is - but the combination of everyone's small feature is complex.
Both systems end up having full programming languages because someone really needs that complexity for something weird - likely someone in your project. However don't abuse that power. 99% of what you need from both should be done in a declarative style that lets the system work and is simple. Just because you can do CI in the build system, or the build system's job with the CI system doesn't mean you should. Make sure you separate them.
You CI system should be a small set of entry points. "./do everything" should be your default. But maybe you need a "build", then "test part-a" and "test part-b" as separate. However those are all entry points that your CI system calls to your build system and they are things you can do locally. Can do locally doesn't mean you do - most of the time locally you should be an incremental build. Nothing should be allowed past CI without doing a full build from scratch just to make sure that works (this isn't saying your CI shouldn't do incremental builds for speed - just that it needs to do full rebuilds as well, and if full rebuild breaks you stop everyone until the full rebuild is fixed).
This is becoming the standard refrain for all software.
Take a look at Prefect - https://www.prefect.io/ - as far as I can see, it ticks a lot of the boxes that the author mentions (if you can live with the fact that the API is a Python SDK; albeit a very good one that gives you all the scripting power of Python). Don't be scared away by the buzzwords on the landing page, browsing the extensive documentation is totally worthwhile to learn about all of the features Prefect offers. Execution can either happen on their paid cloud offering or self-hosted on your own physical or cloud premises at no extra cost. The Python SDK is open source.
Disclaimer: I am not affiliated with Prefect in any way.
This is because yes, it is very complex. I have tried Jenkins before and Gitlab CI.
Something that most build tools and CIs should learn from Meson build system is that sometimes it is better to just keep it simple than adding features on top. If you need them, script them in some way but keep configuration as data-driven (and I mean purely data-driven, not half a language).
My build system is literally: a build matrix, where you can specify filters of what to keep or skip. This gets all combined.
A series of steps with a name that can be executed or not depending on a filter. Nothing else. Every step calls the build system or whatever.
After that it sends mail reports and integrates with Gerrit to send builds and Gerrit csn also csll it.
No fsncy plugins or the like. Just this small toml file I have and run normal scripts or command lines without 300 layers on top. There are already enough things that can break so that one keeps adding opaque layers on top. Just use the tools we all know: ssh, bash, Python etc.
Everyone knows how to call that. If a step is too complex, just make a script.
Https://linci.tp23.org
Ci is too complicated and are basically about locking. But what you (should) do is run cli commands on dedicated boxes in remote locations.
In Linci every thing done remote is the same locally. Just pick a box for the job.
There is almost no code, and what there is could be rewritten is any language if you prefer. Storage is git/VCs + filesystem.
Filesystem are kit fashionable because they are a problem for the big boys but not for you or I. File system storage makes thing easy and hackable.
That is unix bread and butter. Microsoft need a ci in yaml. Linux does not.
Been using it for a while an a small scale and it's never made me want anything else.
Scripting bash Remoting ssh Auth pam Notification irc/II (Or mail stomp etc) Scheduling crond Webhooks not needed if repo is on the same container use bash for most hooks, and nodejs server that calls cli for github
Each and every plug-in is a bash script and some env variables.
Read other similar setups hacked up with make. But I don't like the env vars handling and syntax of make. Bash is great if what you do is simple, and as the original article points out so clearly, if your ci is complicated you should probably rethink it.
CI debugging at my day job is literally impossible. Read logs, try the whole flow again from the beginning.
With Linci, I can fix any stage in the flow, if I want to, or check-in and run again if I an 99% sure it will work.
For each task t in topological order:
Promise.all(all in-edges to t).then(t)
Want to run tasks on remote machines? Simply waves hands make a task that runs ssh.The purpose of Continuous Integration is to produce the One Canonical Latest Build for a given system. Well... no surprise that there's a ton of overlap between these systems and Bazel etc. "build systems".
But GitLab pipelines etc. are also Continuous Deployment systems. You don't always need fancy ArgoCD pull-based deployments or, their precursor, Chef/Puppet were also pull-based deployments for VMs. You can just have GitLab run a deployment script that calls kubectl apply, or Capistrano, or scp and ssh systemctl restart, or whatever deploys the software for you. That's not something that makes sense as part of your build system.
https://blog.mitchjlee.com/2020/your-writing-style-is-costly
One workaround that I have briefly played with but haven't tried in anger: Gitlab lets you dynamically create its `.gitlab-ci.yaml` file: https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#d...
So you can have your build system construct its DAG and then convert that into a `.gitlab-ci.yaml` to run the actual commands (which may be on different platforms, machines, etc.). Haven't tried it though.
FWIW Github also allows creating CI definitions dynamically.
(Of course, this is only possible because I can build software in a bash shell. Basically: if you're using bash already, you don't need a foreign CI service - you just need to replace yourself with a bash script.)
I've got one for updating repo's and dealing with issues, I've got one for setting up resources and assets required prior to builds, I've got one for doing the build - then another one for packaging, another for signing and notarization, and finally one more for delivering the signed, packaged, built software to the right places for testing purposes, as well as running automated tests, reporting issues, logging the results, and informing the right folks through the PM system.
And this all integrates with our project management software (some projects use Jira, some use Redmine), since CLI interfaces to the PM systems are easily attainable and set up. If a dev wants to ignore one stage in the build pipeline, they can - all of this can be wrapped up very nicely into a Makefile/CMakeLists.txt rig, or even just a 'build-dev.sh vs. build-prod.sh' mentality.
And the build server will always run the build/integration workflow according to the modules, and we can always be sure we'll have the latest and greatest builds available to us whenever a dev goes on vacation or whatever.
And all this with cross-platform, multiple-architecture targets - the same bash scripts, incidentally, run on Linux, MacOS and Windows, and all produce the same artefacts for the relevant platform: MacOS=.pkg, Windows=.exe, Linux=.deb(.tar)
Its a truly wonderful thing to onboard a developer, and they don't need a Jenkins login or to set up Github accounts to monitor actions, and so on. They just use the same build scripts, which are a key part of the repo already, and then they can just push to the repo when they're ready and let the build servers spit out the product on a network share for distribution within the group.
This works with both Debug and Release configs, and each dev can have their own configuration (by modifying the bash scripts, or rather the env.sh module..) and build target settings - even if they use an IDE for their front-end to development. (Edit: /bin/hostname is your friend, devs. Use it to identify yourself properly!)
Of course, this all lives on well-maintained and secure hardware - not the cloud, although theoretically it could be moved to the cloud, there's just no need for it.
I'm convinced that the CI industry is mostly snake-oil being sold to technically incompetent managers. Of course, I feel that way about a lot of software services these days - but really, to do CI properly you have to have some tooling and methodology that just doesn't seem to be being taught any more, these days. Proper tooling seems to have been replaced with the ideal of 'just pay someone else to solve the problem and leave management alone'.
But, with adequate methods, you can probably build your own CI system and be very productive with it, without much fuss - and I say this with a view on a wide vista of different stacks in mind. The key thing is to force yourself to have a 'developer workstation + build server' mentality from the very beginning - and NEVER let yourself ship software from your dev machine.
(EDIT: call me a grey-beard, but get off my lawn: if you're shipping your code off to someone else [github actions, grrr...] to build artefacts for your end users, you probably haven't read Ken Thompsons' "Reflections On Trusting Trust" deeply or seriously enough. Pin it to your forehead until you do!)
I believe github actions does all of this? I use the first two features
However, with time, you can have a very good feel of these CI systems, their strong and weak points, and basically learn how to use them in the simplest way possible in a given situation. Many problems I saw IRL are just a result of an overly complex design.
I have never seen a system with documentation as awful as Jenkins, with plugins as broken as Jenkins, with behaviors as broken as Jenkins. Groovy is a cancer, and the pipelines are half assed, unfinished and incompatible with most things.
It would probably be more constructive if you elaborated what your issues specifically were. For example, what have you found pipelines to be incompatible with? I've literally never seen anything they don't work with, so I can't really agree with your assessment without specifics. Similarly, I have zero problem with Groovy. If it's just not to your taste then fine, taste is subjective, but I can't see any substantive reason to call it "a cancer".
Continuous: do it often, daily or more often
Integration: merging changes to main
He's talking about build tools, which are a _support system_ for actual CI, but are not a substitute for it. These systems allow you to Continuously integrate, quickly and safely. But they aren't the thing itself. Using them without frequent merges to main is common, but isn't CI. It's branch maintenance.
Yes, semantic drift is a thing, but you won't get the actual benefits of the actual practice if you do something else.
If you want to talk "misdirected CI", start there.
A tale as old as time I suppose…
The same goes for other tools: build tools (ant, maven, gradle, npm, etc.); Configuration systems (puppet, ansible, salt, etc.); Infrastructure provisioning (cloudformation, terraform, etc.); other containerization and packaging tools (packer, docker, etc.).
Stick to what they are good at. Don't overload them with crap outside the scope of what they do (boiling oceans, lots of conditional logic, etc.). And consider whether you need them at all. Write scripts for all the rest. My default is a simple bash script. Replacing a 2 line script with 100+ lines of yaml is a clear sign that something is wrong with what you are doing.
A consideration lately is not just automated builds but having agentic coding tools be able to work with your software. I just spent an afternoon nudging codex along to vibe code me a new little library. Mostly it's nailing it and I'm iterating with it on features, tests, documentation etc. It of course needs to be able to run tests to validate what it's doing. And it needs to be able to figure out how. The more complicated that is, the less likely it is to be useful.
CI and agentic coding have similar needs: simplicity and uniformity. If you have that, everything gets easier.
Anything custom and wonky needs to isolated and removed from the critical path. Or removed completely. Devops work is drudgery that needs to be minimized and automated. If it becomes most of what you do, you're doing it wrong. If an agentic coding system can figure out how to build and run your stuff, getting it to setup CI and deployment scripts is not that much of a leap in complexity.
After a few decades with this stuff, I have a low threshold for devops bullshit. I've seen that go sideways and escalate into months long projects to do god knows what a few times too often. Life is too short to deal with that endlessly. The point of automating stuff is so you can move on and do more valuable things. If automating it takes up all your time, something is very wrong.
My conclusion was that this is near 100% a design taste and business model problem. That is, to make progress here will require a Steve Jobs of build systems. There's no technical breakthroughs required but a lot of stuff has to gel together in a way that really makes people fall in love with it. Nothing else can break through the inertia of existing practice.
Here are some of the technical problems. They're all solvable.
• Unifying local/remote execution is hard. Local execution is super fast. The bandwidth, latency and CPU speed issues are real. Users have a machine on their desk that compared to a cloud offers vastly higher bandwidth, lower latency to storage, lower latency to input devices and if they're Mac users, the fastest single-threaded performance on the market by far. It's dedicated hardware with no other users and offers totally consistent execution times. RCE can easily slow down a build instead of speeding it up and simulation is tough due to constantly varying conditions.
• As Gregory observes, you can't just do RCE as a service. CI is expected to run tasks devs aren't trusted to do, which means there has to be a way to prove that a set of tasks executed in a certain way even if the local tool driving the remote execution is untrusted, along with a way to prove that to others. As Gregory explores the problem he ends up concluding there's no way to get rid of CI and the best you can do is reduce the overlap a bit, which is hardly a compelling enough value prop. I think you can get rid of conventional CI entirely with a cleverly designed build system, but it's not easy.
• In some big ecosystems like JS/Python there aren't really build systems, just a pile of ad-hoc scripts that run linters, unit tests and Docker builds. Such devs are often happy with existing CI because the task DAG just isn't complex enough to be worth automating to begin with.
• In others like Java the ecosystem depends heavily on a constellation of build system plugins, which yields huge levels of lock-in.
• A build system task can traditionally do anything. Making tasks safe to execute remotely is therefore quite hard. Tasks may depend on platform specific tooling that doesn't exist on Linux, or that only exists on Linux. Installed programs don't helpfully offer their dependency graphs up to you, and containerizing everything is slow/resource intensive (also doesn't help for non-Linux stuff). Bazel has a sandbox that makes it easier to iterate on mapping out dependency graphs, but Bazel comes from Blaze which was designed for a Linux-only world inside Google, not the real world where many devs run on Windows or macOS, and kernel sandboxing is a mess everywhere. Plus a sandbox doesn't solve the problem, only offers better errors as you try to solve it. LLMs might do a good job here.
But the business model problems are much harder to solve. Developers don't buy tools only SaaS, but they also want to be able to do development fully locally. Because throwing a CI system up on top of a cloud is so easy it's a competitive space and the possible margins involved just don't seem that big. Plus, there is no way to market to devs that has a reasonable cost. They block ads, don't take sales calls, and some just hate the idea of running proprietary software locally on principle (none hate it in the cloud), so the only thing that works is making clients open source, then trying to saturate the open source space with free credits in the hope of gaining attention for a SaaS. But giving compute away for free comes at staggering cost that can eat all your margins. The whole dev tools market has this problem far worse than other markets do, so why would you write software for devs at all? If you want to sell software to artists or accountants it's much easier.
> So going beyond the section title: CI systems aren't too complex: they shouldn't need to exist. Your CI functionality should be an extension of the build system.
True. In the sense that if you are running a test/build, you probably want to start local first (dockerize) and then run that container remotely. However, the need for CI stems from the fact that you need certain variables (ie: you might want to run this, when commit that or pull request this or that, etc.) In a sense, a CI system goes beyond the state of your code to the state of your repo and stuff connected to your repo (ie: slack)
> There is a GitHub Actions API that allows you to interact with the service. But the critical feature it doesn't let me do is define ad-hoc units of work: the actual remote execute as a service. Rather, the only way to define units of work is via workflow YAML files checked into your repository. That's so constraining!
I agree. Which is why most people will try to use the container or build system to do these complex tasks.
> Taskcluster's model and capabilities are vastly beyond anything in GitHub Actions or GitLab Pipelines today. There's a lot of great ideas worth copying.
You still need to run these tasks as containers. So, say if you want to compare two variables, that's a lot of compute for a relatively simple task. Which is why the status quo has settled with GitHub Actions.
> it should offer something like YAML configuration files like CI systems do today. That's fine: many (most?) users will stick to using the simplified YAML interface.
It should offer a basic programming/interpreted language like JavaScript.
This is an area where WebAssembly can be useful. At its core, WASM is a unit of execution. It is small, universal, cheap and has a very fast startup time compared to a full OS container. You can also run arbitrarily complex code in WASM while ensuring isolation.
My idea here is that CI becomes a collection of executable tasks that the CI architect can orchestrate while the build/test systems remain a simple build/test command that run on a traditional container.
> Take Mozilla's Taskcluster and its best-in-class specialized remote execute as a service platform.
That would be a mistake, in my opinion. There is a reason Taskcluster has failed to get any traction. Most people are not interested in engineering their CI but in getting tasks executed on certain conditions. Most companies don't have people/teams dedicated for this and it is something developers do alongside their build/test process.
> Will this dream become a reality any time soon? Probably not. But I can dream. And maybe I'll have convinced a reader to pursue it.
I am :) I do agree with your previous statement that it is a hard market to crack.