FilterHN

Modifying other people's software

80 points

by todsacerdoti

4 days ago

| past

| 11 comments

| natkr.com

| HN

▲

skydhash

16 hours ago

[-]

Maybe I can't understand what TFA is describing, but from what I know a patch is usually tied to a specific commit, so a very specific point of time in the upstream lifetime. It does not make sense to have it lingering longer than that. Even in the case when you want to maintain a set of patches (package building,...) you usually revise it every new version of the software. In this case, the intent is much more important than the how (which quickly become history).

▲

lmm

14 hours ago

[-]

The point is to maintain your set (perhaps stack) of patches as a set of patches on top of upstream for the long term. Yes, you will probably have to revise them as upstream changes, but this will let you maintain their identity as you do so. Is that something you will find useful? Maybe, maybe not.

▲

doix

16 hours ago

[-]

Yes, I don't quite get it. When I need to maintain a fork, I just add an extra remote to git. Then I fetch upstream (what I call my remote) and rebase my changes against whatever branch I'm following. At any point in time I can generate a patch file that works for whatever version I have rebased against.

Seems easy enough, I read the article multiple times and I don't get why what they are describing is needed.

▲

Nullabillity

15 hours ago

[-]

(Author here.)

The difference is that git rebasing is a destructive operation, you lose track of the old version when you do it. (Yes, there's technically the reflog.. but it's much less friendly to browse, and there's no way to share it across a team.)

Maybe that's an okay tradeoff for something you use by yourself, but it gets completely untenable when you're multiple people maintaining it together, because constantly rebasing branches completely breaks Git's collaboration model.

▲

doix

14 hours ago

[-]

I worked at a place that was allergic to contributing patches upstream. We maintained a lot of internal forks for things and had no problem collaborating.

You don't need to push the rebased branch to the same branch on your remote, if that's an issue (although I don't see how it is).

Maybe this is a case of "Dropbox is just rsync", but I feel like just learning git and using it is easier than learning a new tool.

▲

NotPractical

5 hours ago

[-]

> I feel like just learning git and using it is easier than learning a new tool

I would agree if this "new tool" we're talking about wasn't just a simple wrapper over existing git commands. You can learn it in its entirety, including how it works (not just how to use it), in a matter of a half hour or less.

▲

nicoburns

12 hours ago

[-]

We do this for some of the components that are shared between Servo and Firefox. Firefox is upstream, and on the Servo side we have automated and manual syncing. The automated syncing mirrors the upstream `main` branch to our `upstream` without changes daily. The manual syncing rebases our changes on top a new upstream version through a manual rebase process. This happens monthly and each sync is pushed to a new branch to maintain history.

Between monthly syncs we push our own changes to our latest monthly branch (which also get manually sent upstream when we get a chance).

▲

cobbzilla

15 hours ago

[-]

I see — you’re doing more than “here’s a few patches to keep working across revisions”, you’re doing separate-path feature work on a different, actively-developed project.

To me that sounds like not a great idea, but if you must do it, I could see some usefulness to this.

▲

Nullabillity

15 hours ago

[-]

Yeah. For reference, this is a typical patchset for the project that motivated it.[0] Some of the patches are "routine" dependency upgrades, some of them are bugfix backports, some of them are original work that we were planning to upstream but hadn't got around to yet. Some are worth keeping when upgrading to a new upstream version, some aren't.

I agree that it's not ideal, but... there are always tradeoffs to manage.

[0]: https://github.com/stackabletech/docker-images/tree/e30798ac...

▲

random3

13 hours ago

[-]

You’re thinking a patch is text, but should think of it as a logical change. Unless the logic becomes part of upstream the patch is not tied to a specific point in “time”. There’s a cost to it, as you have to constantly rebase. This is the case with any non-vanilla distribution (e.g. Linux), although it’s also at a package level so you do this both for each package as well across every package. For well written code there’s reasonably low coupling so it’s less work to maintain.

▲

cobbzilla

15 hours ago

[-]

Agreed. If you want your change and don’t want to bother the maintainers with a patch they are unlikely to accept, or can’t because it’s proprietary: fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.

The likelihood of conflicts is minimal, and often if you see conflicts it’s a good indication your issue may have been resolved. Or if not, you can see if it’s still needed, or how to adjust it.

▲

Nullabillity

15 hours ago

[-]

(Author here.)

> fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.

Yeah, this is the workflow that Lappverk is trying to enable.

The problem is that neither of Git's collaboration models works well for this problem. Rebasing breaks collaboration (and history for the patchset itself), and merging quickly loses track of individual patches. Lappverk is an attempt to provide a safer way to collaborate over the rebase workflow.

▲

skydhash

8 hours ago

[-]

But you can always create a new branch before rebasing if you want to store the old revision metadata. or do a git format-patches if you don’t want a bunch of branches laying around. So what are the ways to be safer than this?

▲

what

16 hours ago

[-]

A patch just encapsulates what was added and removed in a particular change, it doesn’t care about any commits.

▲

shmerl

15 hours ago

[-]

For example wine-staging (ran by Wine developers themselves) hosts patches for Wine project and they revise / rebase them with each Wine version, which is often not a trivial task. I don't see how you can avoid that really. But Wine staging itself is a git repository that holds patches (and their history) if that helps, which indeed can stay there for years.

Same happens with patches that Debian applies on top of fixed versions of packages. They are stored in Debian's Salsa git.

▲

praptak

11 hours ago

[-]

You may have a look at Quilt. I doesn't solve the problem the author described but may help you once you accept there is no easy solution in sight.

Quilt is automation for the "bag of patches" model. I used it once when I needed to upgrade the internal bag of patches at $big_corp so as to apply them to a newer version of $public_app. It was predictably complex but somehow still manageable.

If you squint a bit then the [bag of patches] + [automated application in order] is a poor man's Git. If you keep this in a git repo then you're basically versioning repos (poor man's ones) in a repo. It almost sounds like the solution to author's problem :)

▲

Nullabillity

3 hours ago

[-]

Yeah, I mentioned Quilt in the post! Lappverk is effectively an exercise in "What if Quilt, but you could interact with it using any Git tooling, rather than Quilt's half-baked custom VCS?".

▲

graynk

5 hours ago

[-]

It's mentioned in the article

▲

actionfromafar

10 hours ago

[-]

quilt is really cool

▲

noirscape

7 hours ago

[-]

I had issues with similar things for a couple years too. The reality is that there's remarkably little existing advice for maintaining a soft fork that doesn't intent to upstream patches. (For reference, probably the most notable patch fork that can't/doesn't upstream anything, GNU IceCat, uses a bash file from hell to apply all of it's changes to the Firefox source code - it is not a scalable solution.)

Ultimately the solution I ended up using was git rebase; it just works the nicest out of all of them:

* Your patches are always kept on top at the git log.

* It's absolutely trivial to drop an unnecessary patch, add a new one in the chain or to merge two patches that do the same thing. (Use git rebase -i for that.) Fixing typos in patches is trivial too.

* Your history isn't so important for a patch fork; the patches are what matters, so don't fret too much about commit hashes changing. I promise you, it'll be fine.

* Git will complain if you try to do a rebase that doesn't work out of the box, using similar tools as resolving merge conflicts. You can instantly do a git pull from another upstream that rebases with git pull --rebase upstream/master . This does assume you've added the upstream as a second origin to git under the name upstream and that they push the code you want to patch onto the master branch.

As for drawbacks, I only wound up with two:

* CI tools and git server UIs often aren't prepared to handle a heavily rebased master branch - it leads to lots of builds that are linked to dangling commit hashes. GitHub also for some reason insists on displaying the date of the last rebase, rather than the date of when the patch was committed. Not sure why.

* Pushing your fork means heavy use of force pushes, which feels instinctively wrong.

The drawback isn't large enough for me to mind it in practice.

Opted to use rebase for this sort of fork after reading a bit about non-merge related git flows and wondering what'd happen if I did a rebase-based workflow but just... never send any patches. Turns out it works really well.

▲

chrismorgan

6 hours ago

[-]

Yeah, using the real repository and rebasing atop the release commit has always seemed fine to me, provided the project uses Git. And if you want to keep track of the patches on old versions, just tag them—if upstream has tag 1.2.3, tag 1.2.3+chrismorgan or similar. This occasionally messes with build scripts—but then, not tagging sometimes does too.

> GitHub also for some reason insists on displaying the date of the last rebase, rather than the date of when the patch was committed. Not sure why.

Sounds like you’re running into the difference between author and committer, which Git models distinctly.

▲

userbinator

17 hours ago

[-]

Many times I've just patched the binary even if source is available, because trying to reproduce the binary you currently have, with only the changes you want and everything else the same, can be an even more difficult exercise than simply changing a string or constant.

▲

PhilipRoman

13 hours ago

[-]

Lol I remember doing this when I was younger with the `man` command to remove a 5 second exit delay for the browser output.

    radare2 -qq -w -c "wx 01 @ 0xb407" /usr/bin/man

▲

taneq

15 hours ago

[-]

Especially if you make a habit of patching the binary instead of rebuilding from source! ;)

▲

anilakar

14 hours ago

[-]

I once wrote a small C++ wrapper for POSIX dlfcn.h. Someone sent a pull request that would have turned it into a Windows-only library.

▲

blueflow

8 hours ago

[-]

I once worked with other people on a github project that was a fork of another project. Upstream had long been dead. One day some other person created a pull request from our project's master branch into upstream.

Result: Our project activity resulted in a "your pull request has been updated" email spam about a pull request we had no control over.

▲

yjftsjthsd-h

14 hours ago

[-]

Like... Intentionally, or because they unthinkingly did something non-portable?

▲

anilakar

9 hours ago

[-]

Didn't ask. My original code was not tied to any specific hosted environment (assuming that Microsoft's POSIX C standard library implementation is more or less correct) but I never made it clear I tested and intended it to be used on Linux only.

▲

thwarted

17 hours ago

[-]

The process described reminded me of "pristine source" and RPM spec files that take the upstream pristine source and patch it during the build process. Maintaining that is always a little bit of a headache if you don't do it regularly, especially having to maintain (generate and apply) a separate set of patch files for the changes and express/apply the patches in the spec file. This looks to make light work of that.

▲

Nullabillity

2 hours ago

[-]

Yep, That's the goal! ^^

▲

attila-lendvai

12 hours ago

[-]

whenever i rebase longstanding commits in my fork, i keep the previous branch by appending the date to its name.

reading the readme didn't make it clear to me how this app would make my life any easier (also considering the added complexity of a new tool).

▲

attila-lendvai

12 hours ago

[-]

don't get me wrong, it's a PITA... but how would it hurt less using this tool?

i rarely, if ever, need to look at the history of this.

▲

datadrivenangel

17 hours ago

[-]

Modifying source code like this is one method. For web software, bookmarklets are another great way to do that.

▲

bartread

17 hours ago

[-]

I’m a big fan of Greasemonkey scripts for this, although these days I prefer Violentmonkey because it has several capabilities that the OG doesn’t.

▲

vlovich123

14 hours ago

[-]

Honestly I found a better strategy to name branches after the fork point and the date you started the fork. So you’d have main-2025-03-07 for a fork of main started 03-07 another main-2025-05-08 for a rebase. The patch set above that is just what you carry. I’m not sure maintaining them as literal patches is that helpful vs just keeping it as explicit patches to apply in git. But maybe this is the right strategy once your fork gets complicated but at that point you should be hard forking rather than soft forking IMO.

▲

cyberax

18 hours ago

[-]

This is supercool. One my constant problem with self-hosting is that I often need to modify just a couple of files here and there, but then I'm stuck with a forked repo or a dirty work copy.

I'm going to try to make a frontend UI for it.

▲

darkwater

11 hours ago

[-]

Are you talking about personal or professional self-host? Why are you constantly patching software you self-host? Not enough configurability? Using software not made for self-host? Holding it wrong? I ask because it seems...strange that you have these issues so often.

▲

cyberax

1 hour ago

[-]

Personal self-hosting.

First, because I have a special setup. Second, I often want some small changes in the software.

▲

ngcc_hk

5 hours ago

[-]

The title is v general. There are at least two kind of modifications - one to minimize the change but just change behaviour and the other is really change the program.

I work for a decade as mainframe technical support mostly install fix. And because of these lately when I spent 3 months as a hobby to change the turbo bridge to take external bridge card. I injected code or hacking of the code like jes2 exit and without touching much the host program modify the host program behaviour.

This is very different from my colleagues who are application programmer who can totally change a cics module involving even changing db2 schema.

What is a modification meant in this title … I wonder.

▲

Nullabillity

3 hours ago

[-]

The former. If you intend to hard-fork then Git's model is already fine. If you're soft-forking and want to model your divergence explicitly then Lappverk might be for you.