Modifying other people's software
80 points
4 days ago
| 11 comments
| natkr.com
| HN
skydhash
16 hours ago
[-]
Maybe I can't understand what TFA is describing, but from what I know a patch is usually tied to a specific commit, so a very specific point of time in the upstream lifetime. It does not make sense to have it lingering longer than that. Even in the case when you want to maintain a set of patches (package building,...) you usually revise it every new version of the software. In this case, the intent is much more important than the how (which quickly become history).
reply
lmm
14 hours ago
[-]
The point is to maintain your set (perhaps stack) of patches as a set of patches on top of upstream for the long term. Yes, you will probably have to revise them as upstream changes, but this will let you maintain their identity as you do so. Is that something you will find useful? Maybe, maybe not.
reply
doix
16 hours ago
[-]
Yes, I don't quite get it. When I need to maintain a fork, I just add an extra remote to git. Then I fetch upstream (what I call my remote) and rebase my changes against whatever branch I'm following. At any point in time I can generate a patch file that works for whatever version I have rebased against.

Seems easy enough, I read the article multiple times and I don't get why what they are describing is needed.

reply
Nullabillity
15 hours ago
[-]
(Author here.)

The difference is that git rebasing is a destructive operation, you lose track of the old version when you do it. (Yes, there's technically the reflog.. but it's much less friendly to browse, and there's no way to share it across a team.)

Maybe that's an okay tradeoff for something you use by yourself, but it gets completely untenable when you're multiple people maintaining it together, because constantly rebasing branches completely breaks Git's collaboration model.

reply
doix
14 hours ago
[-]
I worked at a place that was allergic to contributing patches upstream. We maintained a lot of internal forks for things and had no problem collaborating.

You don't need to push the rebased branch to the same branch on your remote, if that's an issue (although I don't see how it is).

Maybe this is a case of "Dropbox is just rsync", but I feel like just learning git and using it is easier than learning a new tool.

reply
NotPractical
5 hours ago
[-]
> I feel like just learning git and using it is easier than learning a new tool

I would agree if this "new tool" we're talking about wasn't just a simple wrapper over existing git commands. You can learn it in its entirety, including how it works (not just how to use it), in a matter of a half hour or less.

reply
nicoburns
12 hours ago
[-]
We do this for some of the components that are shared between Servo and Firefox. Firefox is upstream, and on the Servo side we have automated and manual syncing. The automated syncing mirrors the upstream `main` branch to our `upstream` without changes daily. The manual syncing rebases our changes on top a new upstream version through a manual rebase process. This happens monthly and each sync is pushed to a new branch to maintain history.

Between monthly syncs we push our own changes to our latest monthly branch (which also get manually sent upstream when we get a chance).

reply
cobbzilla
15 hours ago
[-]
I see — you’re doing more than “here’s a few patches to keep working across revisions”, you’re doing separate-path feature work on a different, actively-developed project.

To me that sounds like not a great idea, but if you must do it, I could see some usefulness to this.

reply
Nullabillity
15 hours ago
[-]
Yeah. For reference, this is a typical patchset for the project that motivated it.[0] Some of the patches are "routine" dependency upgrades, some of them are bugfix backports, some of them are original work that we were planning to upstream but hadn't got around to yet. Some are worth keeping when upgrading to a new upstream version, some aren't.

I agree that it's not ideal, but... there are always tradeoffs to manage.

[0]: https://github.com/stackabletech/docker-images/tree/e30798ac...

reply
random3
13 hours ago
[-]
You’re thinking a patch is text, but should think of it as a logical change. Unless the logic becomes part of upstream the patch is not tied to a specific point in “time”. There’s a cost to it, as you have to constantly rebase. This is the case with any non-vanilla distribution (e.g. Linux), although it’s also at a package level so you do this both for each package as well across every package. For well written code there’s reasonably low coupling so it’s less work to maintain.
reply
cobbzilla
15 hours ago
[-]
Agreed. If you want your change and don’t want to bother the maintainers with a patch they are unlikely to accept, or can’t because it’s proprietary: fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.

The likelihood of conflicts is minimal, and often if you see conflicts it’s a good indication your issue may have been resolved. Or if not, you can see if it’s still needed, or how to adjust it.

reply
Nullabillity
15 hours ago
[-]
(Author here.)

> fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.

Yeah, this is the workflow that Lappverk is trying to enable.

The problem is that neither of Git's collaboration models works well for this problem. Rebasing breaks collaboration (and history for the patchset itself), and merging quickly loses track of individual patches. Lappverk is an attempt to provide a safer way to collaborate over the rebase workflow.

reply
skydhash
8 hours ago
[-]
But you can always create a new branch before rebasing if you want to store the old revision metadata. or do a git format-patches if you don’t want a bunch of branches laying around. So what are the ways to be safer than this?
reply
what
16 hours ago
[-]
A patch just encapsulates what was added and removed in a particular change, it doesn’t care about any commits.
reply
shmerl
15 hours ago
[-]
For example wine-staging (ran by Wine developers themselves) hosts patches for Wine project and they revise / rebase them with each Wine version, which is often not a trivial task. I don't see how you can avoid that really. But Wine staging itself is a git repository that holds patches (and their history) if that helps, which indeed can stay there for years.

Same happens with patches that Debian applies on top of fixed versions of packages. They are stored in Debian's Salsa git.

reply
praptak
11 hours ago
[-]
You may have a look at Quilt. I doesn't solve the problem the author described but may help you once you accept there is no easy solution in sight.

Quilt is automation for the "bag of patches" model. I used it once when I needed to upgrade the internal bag of patches at $big_corp so as to apply them to a newer version of $public_app. It was predictably complex but somehow still manageable.

If you squint a bit then the [bag of patches] + [automated application in order] is a poor man's Git. If you keep this in a git repo then you're basically versioning repos (poor man's ones) in a repo. It almost sounds like the solution to author's problem :)

reply
Nullabillity
3 hours ago
[-]
Yeah, I mentioned Quilt in the post! Lappverk is effectively an exercise in "What if Quilt, but you could interact with it using any Git tooling, rather than Quilt's half-baked custom VCS?".
reply
graynk
5 hours ago
[-]
It's mentioned in the article
reply
actionfromafar
10 hours ago
[-]
quilt is really cool
reply
noirscape
7 hours ago
[-]
I had issues with similar things for a couple years too. The reality is that there's remarkably little existing advice for maintaining a soft fork that doesn't intent to upstream patches. (For reference, probably the most notable patch fork that can't/doesn't upstream anything, GNU IceCat, uses a bash file from hell to apply all of it's changes to the Firefox source code - it is not a scalable solution.)

Ultimately the solution I ended up using was git rebase; it just works the nicest out of all of them:

* Your patches are always kept on top at the git log.

* It's absolutely trivial to drop an unnecessary patch, add a new one in the chain or to merge two patches that do the same thing. (Use git rebase -i for that.) Fixing typos in patches is trivial too.

* Your history isn't so important for a patch fork; the patches are what matters, so don't fret too much about commit hashes changing. I promise you, it'll be fine.

* Git will complain if you try to do a rebase that doesn't work out of the box, using similar tools as resolving merge conflicts. You can instantly do a git pull from another upstream that rebases with git pull --rebase upstream/master . This does assume you've added the upstream as a second origin to git under the name upstream and that they push the code you want to patch onto the master branch.

As for drawbacks, I only wound up with two:

* CI tools and git server UIs often aren't prepared to handle a heavily rebased master branch - it leads to lots of builds that are linked to dangling commit hashes. GitHub also for some reason insists on displaying the date of the last rebase, rather than the date of when the patch was committed. Not sure why.

* Pushing your fork means heavy use of force pushes, which feels instinctively wrong.

The drawback isn't large enough for me to mind it in practice.

Opted to use rebase for this sort of fork after reading a bit about non-merge related git flows and wondering what'd happen if I did a rebase-based workflow but just... never send any patches. Turns out it works really well.

reply
chrismorgan
6 hours ago
[-]
Yeah, using the real repository and rebasing atop the release commit has always seemed fine to me, provided the project uses Git. And if you want to keep track of the patches on old versions, just tag them—if upstream has tag 1.2.3, tag 1.2.3+chrismorgan or similar. This occasionally messes with build scripts—but then, not tagging sometimes does too.

> GitHub also for some reason insists on displaying the date of the last rebase, rather than the date of when the patch was committed. Not sure why.

Sounds like you’re running into the difference between author and committer, which Git models distinctly.

reply
userbinator
17 hours ago
[-]
Many times I've just patched the binary even if source is available, because trying to reproduce the binary you currently have, with only the changes you want and everything else the same, can be an even more difficult exercise than simply changing a string or constant.
reply
PhilipRoman
13 hours ago
[-]
Lol I remember doing this when I was younger with the `man` command to remove a 5 second exit delay for the browser output.

    radare2 -qq -w -c "wx 01 @ 0xb407" /usr/bin/man
reply
taneq
15 hours ago
[-]
Especially if you make a habit of patching the binary instead of rebuilding from source! ;)
reply
anilakar
14 hours ago
[-]
I once wrote a small C++ wrapper for POSIX dlfcn.h. Someone sent a pull request that would have turned it into a Windows-only library.
reply
blueflow
8 hours ago
[-]
I once worked with other people on a github project that was a fork of another project. Upstream had long been dead. One day some other person created a pull request from our project's master branch into upstream.

Result: Our project activity resulted in a "your pull request has been updated" email spam about a pull request we had no control over.

reply
yjftsjthsd-h
14 hours ago
[-]
Like... Intentionally, or because they unthinkingly did something non-portable?
reply
anilakar
9 hours ago
[-]
Didn't ask. My original code was not tied to any specific hosted environment (assuming that Microsoft's POSIX C standard library implementation is more or less correct) but I never made it clear I tested and intended it to be used on Linux only.
reply
thwarted
17 hours ago
[-]
The process described reminded me of "pristine source" and RPM spec files that take the upstream pristine source and patch it during the build process. Maintaining that is always a little bit of a headache if you don't do it regularly, especially having to maintain (generate and apply) a separate set of patch files for the changes and express/apply the patches in the spec file. This looks to make light work of that.
reply
Nullabillity
2 hours ago
[-]
Yep, That's the goal! ^^
reply
attila-lendvai
12 hours ago
[-]
whenever i rebase longstanding commits in my fork, i keep the previous branch by appending the date to its name.

reading the readme didn't make it clear to me how this app would make my life any easier (also considering the added complexity of a new tool).

reply
attila-lendvai
12 hours ago
[-]
don't get me wrong, it's a PITA... but how would it hurt less using this tool?

i rarely, if ever, need to look at the history of this.

reply
datadrivenangel
17 hours ago
[-]
Modifying source code like this is one method. For web software, bookmarklets are another great way to do that.
reply
bartread
17 hours ago
[-]
I’m a big fan of Greasemonkey scripts for this, although these days I prefer Violentmonkey because it has several capabilities that the OG doesn’t.
reply
vlovich123
14 hours ago
[-]
Honestly I found a better strategy to name branches after the fork point and the date you started the fork. So you’d have main-2025-03-07 for a fork of main started 03-07 another main-2025-05-08 for a rebase. The patch set above that is just what you carry. I’m not sure maintaining them as literal patches is that helpful vs just keeping it as explicit patches to apply in git. But maybe this is the right strategy once your fork gets complicated but at that point you should be hard forking rather than soft forking IMO.
reply
cyberax
18 hours ago
[-]
This is supercool. One my constant problem with self-hosting is that I often need to modify just a couple of files here and there, but then I'm stuck with a forked repo or a dirty work copy.

I'm going to try to make a frontend UI for it.

reply
darkwater
11 hours ago
[-]
Are you talking about personal or professional self-host? Why are you constantly patching software you self-host? Not enough configurability? Using software not made for self-host? Holding it wrong? I ask because it seems...strange that you have these issues so often.
reply
cyberax
1 hour ago
[-]
Personal self-hosting.

First, because I have a special setup. Second, I often want some small changes in the software.

reply
ngcc_hk
5 hours ago
[-]
The title is v general. There are at least two kind of modifications - one to minimize the change but just change behaviour and the other is really change the program.

I work for a decade as mainframe technical support mostly install fix. And because of these lately when I spent 3 months as a hobby to change the turbo bridge to take external bridge card. I injected code or hacking of the code like jes2 exit and without touching much the host program modify the host program behaviour.

This is very different from my colleagues who are application programmer who can totally change a cics module involving even changing db2 schema.

What is a modification meant in this title … I wonder.

reply
Nullabillity
3 hours ago
[-]
The former. If you intend to hard-fork then Git's model is already fine. If you're soft-forking and want to model your divergence explicitly then Lappverk might be for you.
reply