I want to add one other note: in any large organization, some developers will use tools in ways nobody can predict. This includes Git. Don't try to force any particular workflow, including mandatory or automatically-enabled hooks.
Instead, put what you want in an optional pre-push hook and also put it into an early CI/CD step for your pull request checker. You'll get the same end result but your fussiest developers will be happier.
And with git, you can even make anything that happens on the dev machines mandatory.
Anything you want to be mandatory needs to go into your CI. Pre-commit and pre-push hooks are just there to lower CI churn, not to guarantee anything.
(With the exception of people accidentally pushing secrets. The CI is too late for that, and a pre-push hook is a good idea.)
s/can/can't?
One key requirement in my setup is that every hook is hermetic and idempotent. I don’t use Rust in production, so I can’t comment on it in depth, but for most other languages—from clang-format to swift-format—I always download precompiled binaries from trusted sources (for example, the team’s S3 storage). This ensures that the tools run in a controlled environment and consistently produce the same results.
It's fairly common to consider working and PR branches to be "unpublished" from a mutability point of view: if I base my work on someone else's PR, I'm going to have to rebase when they rebase. Merging to `main` publishes the commit, at which point it's immutable.
Working with JJ, its default behaviour is to consider parents of a branch that's not owned by you to be immutable.
If I or someone else bases something off anything but master that's on them to rebased and keep up to date.
But until that PR is open? Totally with you. There is no obligation to "preserve history" up until that point.
I regularly work with Github, Bitbucket, and Gitlab. Everything I said applies except for the fact that I said "PR" instead of "MR". But yes, you're right. I'm highlighting a specific, albeit extremely popular, workflow.
In any case, my comment just reflects on the fact that you had a series of patches that you could not squash or rebase. It stuck.
And the fact that I see many people use the abbreviation "PR" for something that is merely a patch or diff. For example you might send a diff to the tech@ mailing list, but you should not refer to it as a PR.
And sometimes I just want to commit work in progress so I can more easily backtrack my changes. These checks are better on pre-push, and definitely should be on the PR pipeline, otherwise they can and will be skipped.
Anyway, thanks for giving me some ammo to make this case.
you will save your org a lot of pain if you do force it, same as when you do force a formatting style rather than letting anyone do what they please.
You can discuss to change it if some parts don't work but consistency lowers the failures, every time.
We’re a game studio with less technical staff using git (art and design) so we use hooks to break some commands that folks usually mess up.
Surprisingly most developers don’t know git well either and this saves them some pain too.
The few power users who know what they’re doing just disable these hooks.
I can accept (but still often skip, with `git push -n`) a time-consuming pre-push hook, but a time-consuming and flaky pre-commit hook is totally unacceptable to my workflows and I will always find a way to work around it. Like everyone else is saying, if you want to enforce some rule on the codebase then do it in CI and block merges on it.
If you're using `pre-commit` the tool, not merely the hook, you can also use something like https://github.com/andrewaylett/pre-commit-action to run the tool in CI. It's a really good way to share check definitions between local development and CI, meaning you've shifted your checks to earlier in the pipeline.
I use Jujutsu day-to-day, which doesn't even support pre-commit hooks. But the tooling is still really useful, and making sure we run it in CI means that we're not relying on every developer having the hooks set up. And I have JJ aliases that help pre-commit be really useful in a JJ workflow: https://github.com/andrewaylett/dotfiles/blob/7a79cf166d1e7b...
I've been much much happier just having a little project specific script I run when I want to do formatting/linting.
The key thing (that several folk have pointed out) is that CI runs the canonical checks. Using something like pre-commit (the tool) makes it easier to at least vaguely standardise making sure that you can run the same checks that CI will run. Having it run from the pre-commit hook fits nicely into many workflows, my own pre-JJ workflow included.
As I'm saying this, I'm realizing I should just wire up Emacs to call `git add {file_being_saved} && git commit -am "autocheckpoint"` every time I save a file. (I will have to figure out how to check if I'm in the middle of some operation like a merge or rebase to not mess those up.)
I'm perfectly happy to have the CI fail if I forget to run the CI locally, which is rare but does happen. In that case I lose 5 minutes or whatever because I have to go find the branch and fix the CI failure and re-push it. The flip side of that is I rarely lose hours of work, or end up painting myself in a corner because commit is too expensive and slows me down and I'm subconsciously avoiding it.
For my part, I find the “local history” feature of the JetBrains IDEs gives me automatic checkpoints I can roll back to without needing to involve git. On my Linux machines I layer in ZFS snapshots (Time Machine probably works just as well for Macs). This gives me the confidence to work throughout the day without needing to compulsively commit. These have the added advantage of tracking files I haven’t yet added to the git repo.
So when I open a Pr, I'll have a branch with a gajillion useless commits, and then curate them down to a logical set of commits with appropriate commit messages. Usually this is a single commit, but if I want to highlight some specific pieces as being separable for a reviewer, it'll be multiple commits.
The key point here is that none of those commits exist until just before I make my final push prior to a PR.
But, if you're really worried about losing 15 minutes of work, I think we have better tools at our disposal, including those that will clean up after themselves over time. Now that I've been using ZFS with automatic snapshots, I feel hamstrung working on any Linux system just using ext4 without LVM. I'm aware this isn't a common setup, but I wish it were. It's amazing how liberating it is to edit code, update a config file, install a new package, etc. are when you know you can roll back the entire system with one simple command (or, restore a single file if you need that granularity). And it works for files you haven't yet added to the git repo.
I guess my point is: I think we have better tools than git for automatic backups and I believe there's a lot of opportunity in developer tooling to help guard against common failure scenarios.
Having 25 meaningless “wip” commits does not help with that. It’s fine when something is indeed a work in progress. But once it’s ready for review it should be presented as a series of cleaned up changes.
If it is indeed one giant ball of mud, then it should be presented as such. But more often than not, that just shows a lack of discipline on the part of the creator. Variable renames, whitespace changes, and other cosmetic things can be skipped over to focus on the meat of the PR.
From my own experience, people who work in open source and have been on the review side of large PRs understand this the best.
Really the goal is to make things as easy as possible for the reviewer. The simpler the reviews process, the less reviewer time you’re wasting.
I've been on a maintenance team for years and it's also been a massive help here, in our svn repos where squashing isn't possible. Those intermediate commits with good messages are the only context you get years down the line when the original developers are gone or don't remember reasons for something, and have been a massive help so many times.
I'm fine with manual squashing to clean up those WIP commits, but a blind squash-merge should never be done. It throws away too much for no good reason.
For one quick example, code linting/formatting should always be a separate commit. A couple times I've seen those introduce bugs, and since it wasn't squashed it was trivial to see what should have happened.
CI also doesn't necessarily help here - you have to have tests for all possible edge cases committed from day one for it to prevent these situations. It may be a month or a year or several years later before you hit one of the weird cases no one thought about.
Calling svn part of the problem is also kind of backwards - it has no bearing on the code quality itself, but I brought it up because it was otherwise forcing good practice because it doesn't allow you to erase context that may be useful later.
Over the time I've been here we've migrated from Bugzilla to Fogbugz to Jira, from an internal wiki to ReadTheDocs to Confluence, and some of these hundreds of repos we manage started in cvs, not svn, and are now slowly being migrated to git. Guess what? The cvs->svn->git migrations are the only ones that didn't lose any data. None of the Bugzilla cases still exist and only a very small number were migrated from FogBugz to Jira. Some of the internal wiki was migrated directly to Confluence (and lost all formatting and internal links in the process), but ReadTheDocs are all gone. Commit messages are really the only thing you can actually rely on.
But this would require hand curation? No development proceeds that way, or if it does then I would question whether the person is spending 80% of their day curating PRs unnecessarily.
I think you must be kind of senior and you can get away with just insisting that other people be less efficient and work in a weird way so you can feel more comfortable?
It's not really hand curation if you're deliberate about it from the get-go. It's certainly not eating up 80% of anyone's time.
Structuring code and writing useful commits a skill to develop, just like writing meaningful tests. As a first step, use `git add -p` instead of `git add .` or `git commit -a`. As an analog, many junior devs will just test everything, even stuff that doesn't make a lot of sense, and then jumble them all together. It takes practice to learn how to better structure that stuff and it isn't done by writing a ton of tests and then curating them after the fact.
> I think you must be kind of senior and you can get away with just insisting that other people be less efficient and work in a weird way so you can feel more comfortable?
Your personal productivity should only be one consideration. The long-term health of the project (i.e., maintenance) and the impact on other people's efficiency also must be considered. And efficiency isn't limited to how quickly features ship. Someone who ships fast but makes it much harder to debug issues isn't a top performer. At least, in my experience. I'd imagine it's team, company, and segment-dependent. For OSS projects with many part-time contributors, that history becomes really important because you may not have the future ability to ask someone why they did something a particular way.
If you’re working on something and a piece of it is clearly self contained, you commit it and move on.
> I think you must be kind of senior and you can get away with just insisting that other people be less efficient and work in a weird way so you can feel more comfortable?
You can work however you like. But when it’s time to ask someone else to review your work, the onus is on you to clean it up to simplify review. Otherwise you’re saying your time is more valuable than the reviewer’s.
I do this. Also I do not spend 80% of my time doing it. It's not hard, nor is it time consuming.
I'm surprised by how confident you are that things simply aren't done this way considering the number of high-profile users of workflows where the commit history is expected to tell a story of how the software evolved over time.
I'm surprised by how confident you are when you can only name projects you've never worked on. I wanted to find a commit of yours to prove my point, but I can't find a line of code you've written.
Presumably, a branch is a logical segment of work. Otherwise, just push directly master/trunk/HEAD. It's what people did for a long time with CVS and arguably worked to some extent. Using merge commits is pretty common and, as such, that branch will get merged into the trunk. Being able to understand that branch in isolation is something I've found helpful in understanding the software as a whole.
> Caring about the history of a branch is weird, I think your approach is just not compatible with how people work.
I get you disagree with me, but you could be less dismissive about it. Work however you want -- I'm certainly not stopping you. I just don't your productivity to come at the expense of mine. And, I offered up other potential (and IMHO, superior) solutions from both developer and system tools.
I suppose what type of project you're working on matters. The "treat git like a versioned zip file" using squashed merges works reasonably well for SaaS applications because you rarely need to roll anything back. However, I've found a logically structured history has been indispensable when working on long-lived projects, particularly in open source. It's how I'm able to dig into a 25 year old OSS tool and be reasonably productive with.
To the point I think you're making: sure, I care what changed, and I can do that with `diff`. But, more often if I'm looking at SCM history I'm trying to learn why a change was made. Some of that can be inferred by seeing what other changes were made at the same time. That context can be explicitly provided with commit messages that explain why a change was made.
Calling it incompatible with how people work is a pretty bold claim, given the practice of squash merging loads of mini commits is a pretty recent development. Maybe that's how your team works and if it works for you, great. But, having logically separate commits isn't some niche development practice. Optimizing for writes could be useful for a startup. A lot of real world software requires being easy to maintain and a good SCM history shines there.
All of that is rather orthogonal to the point I was trying to add to the discussion. We have better tools at our disposal than running `git commit` every 15 minutes.
Don't do that, just dont.
I too was about to become a war criminal.
My git rebase workflow often involves running `git rebase -x "cargo clippy -- --deny=warnings"`. This needs a full checkout to work and not just a single file input
The intended future solution is `jj run` (https://docs.jj-vcs.dev/latest/design/run/), which applies similar ideas to more general commands.
https://github.com/andrewaylett/dotfiles/blob/7a79cf166d1e7b...
What I really want is some way within jj to keep track of which commits have been checked and which are currently failing, so I can template it into log lines.
I almost always have a "this cicd must pass to merge" job, that includes linting etc, and then use squash commits exclusively when merging.
My coworker did that the other day and I'm deciding how to respond.
I'm particular about formatting, and it doesn't always match group norms. So I'll reformat things to my preferred style while working locally, and then reformat before pushing. However I may have several commits locally that then ge curated out of existence prior to pushing.
Lefthook with glob+stage_fixed for formatters makes one of the issues raised a complete non-issue.
I'll write a in-depth post about it maybe within the next week or so, been diving into these in my hobby projects for a year or so.
The pre commit script (https://github.com/ThomasHabets/rustradio/blob/main/extra/pr...) triggers my executor which sets up the pre commit environment like so: https://github.com/ThomasHabets/rustradio/blob/main/tickbox/...
I run this on every commit. Sure, I have probably gone overboard, but it has prevented problems, and I may be too picky about not having a broken HEAD. But if you want to contribute, you don't have to run any pre commit. It'll run on every PR too.
I don't send myself PRs, so this works for me.
Of course I always welcome suggestions and critique on how to improve my workflow.
And least nothing is stateful (well, it caches build artefacts), and aside from "cargo deny" no external deps.
(One very nice thing about AI-assisted programming: Claude is not offended by duplicating the same code over and over and using utterly awful APIs, and Claude feels no particular compulsion to get distracted thinking about how the APIs I’m targeting could be made to be less awful. Try asking the Home Assistant docs how an integration is supposed to handle a hot-added entity after an integration finishes setup: you will not get an answer. Ask Claude and it will cheerfully copy the ludicrous and obviously inappropriate solution used by other integrations. Sometimes semi-blindly doing something that works is the right solution when writing piles of glue code.)
What if I've only staged one part of a file, but the pre-commit hook fails on the unstaged portions, which should be fine since I'm not commiting or pushing those changes.
But if I intend to squash and merge, then who cares about intermediate state.
This is a really interesting perspective. Personally I commit code that will fail the build multiple times per day. I only care that something builds at the point it gets merged to master.
(i'm assuming your are not squashing when merging, else it's pretty much the same workflow)
I AM squashing before merging. Pre-commit hooks run on any commit on any branch, AFAIK. In any serious repo I'd never be committing to master directly.
What do you do when you are working on something and are forced to switch to working on something else in the middle of it?
I usually put it on a branch, even if this project otherwise does all its development on the main branch. And I commit it without running precommits, and with a commit message prefix "WIP: ". If it's on a branch you can even push it to not lose work if your local machine breaks/is stolen.
When it's time to get it into the main branch I rebase to squash commits into working ones.
Now, if my final commit history of say 3 commits all actually build at each commit? For personal projects, no. Diminishing returns. But in a collaborative environment: How fun will it be for future you, or your team mates, to run bisect if half the commits don't even build?
I have this workflow because it's so easy to add a feature, breaking 3 tests, to be fixed later. And formatting is bad. And now I add another change, and I just keep digging and one can end up in a "oh no, how did I end up here?" state where different binaries in the tree need to be synced to different commits to even build.
> I feel like insisting on atomic commits in your local checkout defeats the entire purpose of using a tool like git.
WIP commits is hardly the only benefit of git or other DVCS over things like subversion.
`git stash` is always an option :) but even if you want to commit it, you can always undo (or `--amend`) the commit when you get back to working. I personally am also a big fan of `git rebase -i` and all the things it allows me to fix up in the history before merging (rebasing) in to the main branch.
Local wip commits didn't come to mind at all
Hell, even most WIP commits will pass the tests (e.g. tests are not yet added for the new code), so I'd run them then too.
... but even that one took several rounds of fiddling and complexifying to get it to behave correctly (e.g. merging commits with already-committed bypassed binaries should be allowed. how do you detect that? did you know to check for that scenario when building the hook? someone's gonna get bitten by it, and there are dozens of these cases).
So yea. Agreed. Do not use pre-commit hooks, they're far more trouble than they seem, and the failure modes / surprises are awful and can be quite hard to figure out to laymen who are experiencing it.
(much of this is true for all git hooks imo)
Make test cases all green locally before pushing, but not in a way that interferes with pushing code and they shouldn't be tied to a particular (D)VCS. Allow uploading all of the separate proposed PRs you want in a proposed "for review" state. After a PR is signed-off and sent for merging, it goes into a linearizing single source of truth backed by an automated testing/smoke testing process before they land "auto-fast-forwarded" in a mostly uncontrolled manner that doesn't allow editing the history directly. Standardization and simplicity are good, and so is requiring peer review of code before it's accepted for existing, production, big systems.
Disallow editing trunk/main/master and whenever there's merge conflict between PRs, manual rebasing of one or the other is required. Not a huge deal.
Also, have structured OWNERS files that include people and/or distribution list(s) of people who own/support stuff. Furthermore, have a USERS file that keeps lists of people who would be affected by restarting/interrupting/changing a particular codebase/service for notification purposes too. In general, monorepo and allowing submitting code for any area by anyone are roughly good approaches.
Also, if most developers are using one editor, configure that editor to run format and auto-fix lint errors. That probably cleans up the majority of unexpected CI failures.
Otherwise, I agree, your project can not rely on any checks running on the dev machine with git.
I prefer to be able to push instantly and get feedback async, because by the time I've decided I'm done with a change, I've already run the tests for it. And like I said, my editor is applying formatting and lints, so those fail more rarely.
But, if your pre-push checks are fast (rather than ~minutes), I can see the utility! It sucks to get an async failure for feedback that can be delivered quickly.
I'm a fan of pre-commit/push hooks, but they have to be fast. At <dayjob> our suite of pre-commit hooks are <50ms and pre-push hooks are <5s. They get regularly reviewed for performance, and if anything can't be made faster, slow pre-commit hooks will get ejected to pre-push, and slow pre-push hooks will get ejected to the regular CI suite.
... and cos most people using git will have to take a second if the hook returns to them "hey, your third commit is incorrect, you forgot ticket number"
prek uninstall; g rbc; prek install
eg. (using the typical aliases)And in the feature branches/merge requests, I don’t merge, only rebase. Rebasing should be the default workflow. Merging adds so many problems for no good reason.
There are use cases for merging, but not as the normal workflow.
With rebasing, there could be a million times the branch was rebased and you would have no idea when and where something got broken by hasty conflict resolution.
When conflicts happen, rebasing is equivalent to merging, just at the commit level instead of at branch level, so in the worst case, developers are met with conflict after conflict, which ends up being a confusing mental burden on less experienced devs and certainly a ”trust the process” kind of workflow for experienced ones as well.
If you want to keep track of what commits belongs to a certain pr, you can still have an empty merge commit at the end of the rebase. Gitlab will add that for you automatically.
The ”hasty conflict resolution ” makes a broken merge waaaay harder to fix than a broken rebase.
And rebasing makes you take care of each conflict one commit at a time, which makes it order by magnitudes easier to get them right, compared to trying to resolve them all in a single merge commit.
Having a ”fix broken merge” commit makes it explicit that there was an issue that was fixed.
Rebase sometimes seems like an attempt at saving face.
Even if you do it properly, the workflow is erasing history of that conflict existing and needing to be resolved. It leaves no trace of what has been worked on, when, and by whom.
The overall project history though, the clarity of changes made, and that bisecting reliably works are important to me.
Or another way; the important unit is whatever your unit of code review is. If you're not reviewing and checking individual commits, they're just noise in the history; the commit messages are not clear and I cannot reliably bisect on them (since nobody is checking that things build).
feature/{first initial} {last initial} DONOTMERGE {yyyy-MM-dd-hh-mm-ss}
Yes, the branch name literally says do not merge.
I commit anything and everything. Build fails? I still commit. If there is a stopping point and I feel like I might want to come back to this point, I commit.
I am violently against any pre commit hook that runs on all branches. What I do on my machine on my personal branch is none of your business.
I create new branches early and often. I take upstream changes as they land on the trunk.
Anyway, this long winded tale was to explain why I rebase. My commits aren't worth anything more than stopping points.
At the end, I create a nice branch name and there is usually only one commit before code review.
Rebasing is kind of a short hand for cherry-picking, fixing up, rewording, squashing, dropping, etc. because these things don't make sense in isolation.
But even if i wasn't using gerrit, sometimes its the easiest way to fix branches that are broken or restructure your work in a more clear way
Too often, merges is only understood as bring the changes from there to here, it may be useful especially if you have release candidates branches and hotfixes. And you want to keep a trave of that process. But I much prefer rebasing and/or squashing PR onto the main branch.
... for code, honestly no idea
Thanks - this is the first example of a pre-commit hook that I can see value in.
To put it more bluntly, pre-commit hooks are pre-commit hooks, exactly what it says on the tin. Not linting hooks or checking hooks or content filters. Depending on what exactly you want to do, they may or may not be the best tool for the job.
To put it even more bluntly, if you are trying to enforce proper formatting, pre-commit hooks are absolutely the wrong tool for the job, as hooks are trivially bypassable, and not shared when cloning a repo, by design.
The `prepare-commit-msg` hook is a better place to do that as it gives the hook some context about the commit (is the user amending an existing commit etc.)
> To put it even more bluntly, if you are trying to enforce proper formatting, pre-commit hooks are absolutely the wrong tool for the job, as hooks are trivially bypassable, and not shared when cloning a repo, by design.
They aren't a substitute for server post-receive hooks but they do help avoid having pushes rejected by the server.
> They tell me I need to have "proper formatting" and "use consistent style". How rude.
> Maybe I can write a pre-commit hook that checks that for me?
git filter is made for that. It works. There are still caveats (it will format whole file so you might end up commiting changes that are formatting fixed of not your own code).
Pre-commit is not for formatting your code. It's for checking whether commit is correct. Checking whether content has ticket ID, or whether the files pass even basic syntax validation
> Only add checks that are fast and reliable. Checks that touch the network should never go in a hook. Checks that are slow and require an update-to-date build cache should never go in a hook. Checks that require credentials or a running local service should never go in a hook.
If you can do that, great! If you can't (say it's something like CI/CD repo with a bunch of different language involved and not every dev have setup for everything to be checked locally), having to override it to not run twice a year is still preferable over committing not working code. We run local checks for stuff that make sense (checking YAML correctness, or decoding encrypted YAMLs with user key so they also get checked), but the ones that don't go remote. It's faster. few ms RTT don't matter when you can leverage big server CPU to run the checks faster
Bonus points, it makes the pain point - interactive rebases - faster, because you can cache the output for a given file hash globally so existing commits during rebase take miliseconds to check at most
> Don't set the hook up automatically. Whatever tool you use that promises to make this reliable is wrong. There is not a way to do this reliably, and the number of times it's broken on me is more than I can count. Please just add docs for how to set it up manually, prominantly featured in your CONTRIBUTING docs. (You do have contributing docs, right?)
DO set it up automatically (or as much as possible. We have script that adds the hooks and sets the repo defaults we use). You don't want new developer to have to spend half a day setting up some git nonsense only to get it wrong. And once you change it, just rerun it
Pre-push might address some of the pain points but it doesn't address the biggest - it puts the developer in a "git hole" if they have something wrong in commit, because while pre-commit will just... cancel the commit till dev fixes it, with pre-push they now need to dig out knowledge on how to edit or undo existing commits
This knowledge is a crucial part of effective use of git every day, so if some junior dev has to learn it quick it's doing them a favor.
> hooks shouldn’t be run during a rebase
The pre-commit framework doesn't run hooks during a rebase.
> hooks should be fast and reliable
The pre-commit framework does its best to make hooks faster (by running them in parallel if possible) and more reliable (by allowing the hook author to define an independent environment the hook runs in), however it's of course still important that the hooks themselves are properly implemented. Ultimately that's something the hook author has to solve, not the framework which runs them.
> hooks should never change the index
As I read it the author says hooks shouldn't change the working tree, but the index insteead and that's what the pre-commit framework does if hooks modify files.
Personally I prefer configuring hooks so they just print a diff of what they would've changed and abort the commit, instead of letting them modify files during a commit.
correct. i'm saying that hook authors almost never do this right, and i'd rather they didn't even try and moved their checks to a pre-push hook instead.