It's a bit like the "Contributors" tab on Github that shows you how many commits each contributor has made but much faster and with many more options.
If you get a chance to try it out, please let me know. I'd love to hear feedback and suggestions. Thank you!
Here is my personal wishlist after a short test-drive.
- Blame-based stats. While it is nice to see an overview of the historical contributions of Bob and Alice, this is not something that I would use on a daily basis. What would be more useful, is to present the same tables based on the blame lines of a tree-ish. This would show the de-facto "owner(s)" of modules/files, something which comes handy when asking for help with something or even assigning reviews. One could also run this iteratively over the history and get some nice timeline graph.
- Support for pattern-based inclusions/exclusions. E.g. I am not interested to see stats on the json files used by tests. Or any kind of auto-generated files (e.g. django migrations).
- Support for a configuration file, to store your preferred settings in your git repo. Something TOML-based perhaps.
- Better packaging (nit). E.g. the linux tarball for v0.6 contains some apple-related "junk" and gnu tar complains about archive format incompatibilities.
Not sure what's happening with the tarball. Will take a look at that.
If a patch or pull request is sent but an internal contributor (the only internal contributor in Vim's case for a long time) reformatted it before merging then the same work is double-counted (if full history is kept) or only attributed to the reformatter (if history is squashed during/before merge).
This doesn't make the result wrong, of course, the tool is doing exactly what it says on the tin. But it does mean that, without reviewing the contribution process (current and past), you might need to be less definite¹, when stating a meaning derived from the result, because how the result is interpreted might not be quite right given the input data available.
----
[0] https://sinclairtarget.com/blog/2025/03/who-will-maintain-vi... in case you skipped by the link when looking at git-who's readme.
[1] perhaps just by giving caveats to make sure that the reader has sufficient context regarding the limit of the process
'git blame' is named after the subversion and CVS blame features that do the same thing. Subversion docs are clear that it's a snarky name and that 'svn praise' and 'svn annotate' are neutral synonyms.
Perhaps someone familiar with CVS can comment on its history there since it seems to be the first source control to add it.
EDIT: and one of the main reasons it's a useful feature is it tells you who to talk to to understand a piece of code, or to coordinate a roll back, or to do any other sort of communication. It probably matters more in a big company where code is changing frequently and you're unlikely to know everyone and what they're working on.
A few years ago, some Atlassian developer changed "Blame" in the BitBucket UI to "Annotate". I remember a lot of people being frustrated because they couldn't find "blame" anywhere and the change was never officially announced. It just happened one day
Someone opened a ticket with BitBucket about it which ended up drawing a lot of attention from frustrated users who couldn't find "blame", and their searches for it on Google led people to the ticket. Atlassian eventually responded saying that they made the change because "blame" sounds bad and can hurt people's feelings somehow (with no examples given of course, though ironically the dev who made the change certainly had hurt feelings after the upset masses had some choice words for the short-sighted decision. Though Atlassian doubled down and I believe closed the ticket without reverting the change, so the confusion remains, as far as I know)
I don't think that they ever mentioned the Subversion/CVS parallel that was drawn to choose that name, so it was really confusing why that was selected. But this comment shed some light on that ancient incident
Nomenclature matters. Do not reinvent terms just for fun.
Dev probably became the public face for a decision made by someone else (eg. Product owner, TL, whatever the business structure is in Atlassian)
The feelings hurt thing is real, unfortunately for myself I am that person that gets butthurt but it's a phrasing thing, "why did you do this?" vs. something more neutral sounding like "hey this has this side effect are you aware".
Anyway unfortunately in my case too we're not allowed to write tests so it really is an exercise in omniscience.
You should also write tests. They ensure that your code works as intended. Some teammates might not understand that untested code can cause more development time since broken features will have to be fixed in production, so highlighting bugs that have to be fixed as well as writing tests thst cover as many cases as possible should shine some light for those still not understanding their value
Sometimes it's easier to adopt better practices when moving to another team or project altogether
It's actually a pretty awful feature because it misses so much context. I've been blamed before for changes which were technically my fault, but while my code was to spec, some unrelated part of the code I was interacting with was not (iirc it was some multi-threaded nonsense like a race condition or something).
It was a super-stressful week of constantly having to defend my design decisions and white-boarding my thought process (think of the "am I taking crazy pills?!?" scene in Zoolander) as my senior coworker tried to gaslight and throw me under the bus.
Maybe I've had a uniquely bad experience with it, but I've vowed to never use it (as a way to attribute `blame`). Code should be holistically understood and it's your job as a technical leader to know how the parts move, resolve issues without drama, and make sure your whole team is on the same page: this is a cohesive team, not an adversarial dick-measuring contest.
For instance, I may want to change some basic behavior. Easy enough, spend some time implementing and testing, and then run into a downstream consequence of that change while implementing. Now I need to make a decision. Reviewing the history of the relevant sections of code, using git blame, can help me uncover the context and ways in which the code I'm puzzling about changing has changed previously. This can be incredibly valuable and speed up or even obviate an amount of discussion around the potential change.
Aviation is the shining example of this, combining high traceability (you should be able to track each part back to the factory and to all the technicians who have worked on it) with accident inquiries that are focused around finding cause and avoiding future risks rather than assigning blame.
> Code should be holistically understood and it's your job as a technical leader to know how the parts move
That's true when we design something. Once the design is done, and is broken, we have to tear it back apart to understand WHAT is broken. That's when blame is useful.
I love using git blame. I love it even more when it comes back with something I made, because then I get to learn. When something I thought was safe turned out to break something, that's an invaluable chance to understand the system better.
That being said, I've totally used the blame output to end a series of excuses from a junior about how "his code was definitely right, but everything else was garbage" because I really do not care. If it worked before, but doesn't work now, that's a problem. Part of the modern process of "fail fast" is also to build up taste about which parts of a working system are spooky.
I find that some people take the "blameless" culture too far, and use it as an excuse not to reflect on outcomes. They just ruffle the code whenever there's a big, and don't think critically about why that bug appeared. What that tells us about the system we're making.
Ofc as my org has gotten bigger, we've lost a lot of the discipline around writing good commit messages so now it's just a mess of large code-changes with 1-line "bugfix" explainations :(
I have a battle at the moment to try and get the team I am in (5 devs) to take their git commit messages and history seriously, but the "TL" has said that he "doesn't care that much about commits/history/etc"
That bit us right on the ass when debugging someone elses branch recently, because the state needed to fix was across three seemingly unconnected commits, so a checkout of one commit + fix then needed to be tested across two other commits.
Obligatory T-Shirt link
https://www.amazon.com/Blame-Ruining-Friendships-Since-T-Shi...
And the committer and author don’t even need to be the same!
But the point, as I read it, is: what matters is the context, i.e. if a line is faulty, how did things look like when it wasn’t faulty? The commit’s content is more often more important than the committer, although the committer is useful because you can ask them if they’re still around.
Linking a line of code back to the commit is useful even if you can't ask the author about it. It tells you what other lines of code are involved and what the overall purpose is. It's significantly more useful if you can link it into documentation outside the code: ticketing systems, requirements docs, etc.
The main limit to svn blame in that situation was that quite often it would hit commit 1, when the codebase had been imported from Visual SourceSafe.
I really want to find or build a tool that can automatically traverse history this way, like git-evolve-log.
(Work hardening is a metalworking term where metal bent back and forth (or hammered) too much becomes brittle; an analogous effect shows up in code, where a piece of code that has been bugfixed a couple of times will probably be need more fixes; there was a published result a decade or so back about using this to focus QA efforts...)
Example output based on Linux kernel @ "Cregit-Linux: how code gets into the kernel": https://cregit.linuxsources.org/
I learned of Cregit recently--just submitted it to HN after seeing multiple recent HN comments discussing issues related to line-based "blame" annotation granularity:
"Cregit-Linux: how code gets into the kernel": https://news.ycombinator.com/item?id=43451654
git blame is about which author most recently touched each line (in what commit); i.e. is to "blame" for that line having its current content.
You're right in that git blame is most useful for finding which commit touched a line. What was done in the commit is more important than who did it.
git blame is very useful even in a solo project where you already know that you wrote every commit.
There is such a concept as a brand new project not requiring version control until it hits a certain stage: you know it when you get there.
Edit: A large part of the reason now that I think about it is that I don't work off real tickets but just bugs I notice or things that get mentioned on the current solo work project. In a team I can just dissect the ticket and am forced to do only that ticket on the branch whereas solo I'm just jumping all over the place. Sometimes I'll do thing X partway, start considering options and in the meantime do thing Y so it's a mess but the tasks get done so.. For context the project is 1 year old developed from 0 by me. Essentially an internal log parsing and analysis tool for a couple formats. Nothing particularly complex.
The manpage explains what the command does. How and why it's used is up to the user.
"git blame" and similar tools often always show my name, even though I didn't write the code.
[0] https://git-scm.com/docs/git-blame#Documentation/git-blame.t...
Example output based on Linux kernel @ "Cregit-Linux: how code gets into the kernel": https://cregit.linuxsources.org/
I learned of Cregit recently--just submitted it to HN after seeing multiple recent HN comments discussing issues related to line-based "blame" annotation granularity:
"Cregit-Linux: how code gets into the kernel": https://news.ycombinator.com/item?id=43451654
Of course, in your situation I guess such a tool would only help if other people use it. :D
Particularly so if I can see that someone wrote bad code, so I can review the rest of their code.
Git log gives the author (et al) given a commit.
Git blame gives the commit given the line and file.
Worth noting that annotate and praise were added to address the semantics, regardless of whether they were the original intent or not
e.g., you realize that something broke A/B test logic on Friday. Sure, there are Jira tickets, but that's slow and annoying to dig through. There are commit messages, but things get squashed, etc. Plus, if you work in a monorepo with about 60 PRs a day, it's hard to know if it was your code or an associated library someone touched.
That's exactly when git bisect helps. It quickly narrows down which commit introduced the bug when you don't know where to start looking. Once bisect identifies the problematic commit, you can then use git blame (if needed) to see who made those specific changes.
Edit: Cleaned up what I was saying to hopefully avoid confusion.
If you want to know which commit actually caused a problem you would use bisect. That may be what you're saying, but it sounded a bit like you are saying blame is better for tracking the culprit commit.
This works even without the alias, by the way: by default `git whatever` will search your path for `git-whatever` and execute it.
FWIW, both brew and kubectl also have adopted this behavior (of $(basename)-plugin style verb extensions) so I find it unlikely they'd all do it if it was a straight-up facepalm
My "blame script" has been slowing down as the repo size increases. I was just about to add caching, like you have.
Have you thought about adding the ability to limit the stats based on a set of file patterns? Perhaps like this, where the file follows gitignore conventions?
git-who table -include-file <fname>
git-who table -ignore-file <fname>
I tried to quickly add this functionality but unfortunately I don't know go.If you have a shell that supports extended globbing, you could do something like:
$ git who table */**/*.go
That works for me using Bash. I believe all that's happening here is that Bash is expanding the globs and passing a long list of individual filepaths as arguments to git who. Git who then passes them to git log so that it only tallies the commits you'd get by running: $ git log */**/*.go
Details here:
https://github.com/Aider-AI/aider/blob/main/scripts/blame.py
Again, nice work on your tool. I’ll spend some more time trying to harness it for my need.
I might have my globbing syntax wrong, but I think that `*/**/*.go` is the same as `**/*.go` unless you have `*.go` files in the working directory.
Or maybe someone has wrote a bot/Git hook for that?
(crontab -l 2>/dev/null; echo '15 22 \* \* \* /usr/bin/git blame --line-porcelain abc123.. -- /path/to/file.txt | awk "/^author-mail/ {print \$2}" | sort -u | /usr/bin/mail -s "Authors" user@example.com') | crontab -
- who deleted this line (which one?)
- who is owner of this method (some guy refactored it or reformatted, but who is the REAL owner, or what was the history of this method)
It doesn't work perfectly, but with magit you can jump to the revision before the refactor/reformat, then do another blame from there. I chased a line of code through several layers of refactors that way before and while the original author was long gone it did help explain why things were initially done that way.
I heavily depend on git-blame to understand code. It's one reason why I generally dislike "cleanup" changes that just change formatting/naming for the sake of it.
That is what I use git for each and every day.
git help mailmap
Amazing work and excited to dig into this more thoroughly
For a rails codebase that is ~18 years old, has 1695 committers and more than 220,000 commits:
time git who
...
real 0m2.885s
user 0m2.711s
sys 0m0.767s
https://code.visualstudio.com/updates/v1_97#_git-blame-infor...
Like the sibling comment, I didn't want to run all of GitLens just for it, but now that it's a built-in I've also been finding it quite useful.
I uninstalled it, I seem to recall it impacting the speed of VS Code a good bit.
this is a great tool!
I made a similar powershell script recently but reverse search from filename to find the authors by commits.
I've been using a git alias for quite some time
`lead = shortlog -s -n --all --no-merges`
Run on log from GNU Bison. We anonymize names so that search engines don't index this comment to those names:
$ ./gwho git-log-stat.txt
NAME LAST-SEEN FILES LINES+ LINES- COMMITS
A___ D_______ Tue Sep 20 08:19:02 2022 +0200 17083 356066 255931 4440
P___ E_____ Mon Mar 17 17:46:43 2025 -0700 4496 61898 71486 1123
J___ E_ D____ Sun Aug 21 17:35:26 2011 -0400 3922 75517 50121 612
R_____ A_____ Thu May 2 16:43:00 2002 +0000 101 7631 4522 23
J____ T____ Sun Jan 21 16:43:58 2001 +0000 200 8308 3205 60
P___ H________ Tue Feb 26 16:28:36 2013 -0800 122 5057 2864 26
A___ R_______ Wed Jan 5 15:47:25 2011 +0200 124 5297 2101 30
T________ R______ Tue Nov 13 10:38:49 2012 +0000 229 3841 2744 94
V_______ T_____ Wed Nov 11 18:55:15 2020 +0100 67 4739 1128 17
J__ M_______ Sat Jan 18 20:52:21 2020 -0800 337 1569 3894 45
J___ M_____ G_______ Mon May 12 00:58:38 2008 +0000 91 2570 2060 49
R______ M_ S_______ Mon Jan 5 00:25:39 1998 +0000 134 2978 1155 64
P____ B______ Tue Nov 11 13:37:36 2008 +0100 90 2991 786 17
A____ V___ Mon Sep 19 19:09:20 2022 +0200 130 2335 943 46
M___ A_____ Sun Jan 20 15:59:34 2002 +0000 201 1797 1291 76
D____ J__ Sun Dec 7 21:54:45 2008 -0800 57 2142 816 15
P_____ B___ Fri Oct 19 11:03:50 2001 +0000 67 1771 1182 19
E___ B____ Thu Aug 27 10:56:53 2009 -0600 61 2172 590 22
V_____ S_____ Fri Jun 29 16:23:42 2012 +0200 64 865 918 9
D____ M________ Thu Nov 10 22:34:22 1994 +0000 26 1526 84 18
D_____ H_________ Thu Jun 13 10:08:19 2013 +0200 6 1142 54 1
V______ I______ Sat Jan 23 13:25:18 2021 -0500 47 824 360 16
A_____ V___________ Thu Feb 27 09:52:03 2020 +0100 25 524 189 13
V_____ M______ C______ Fri Feb 14 18:41:55 2020 +0100 16 284 364 1
J_____ W___ Tue Feb 16 08:00:28 2021 -0600 18 306 33 3
W_______ P____ Thu Feb 21 17:08:18 2008 +0000 13 195 76 9
J___ S____ Wed Jul 26 00:30:05 2017 -0400 67 88 88 63
E___ S_ R______ Wed Feb 13 10:39:54 2019 -0500 2 133 33 1
Y_______ K_____ Mon Nov 11 08:57:15 2019 +0900 5 125 13 2
K______ K_______ Sun Jan 27 06:58:17 2019 +0100 3 85 24 1
H_ S_ T___ Fri Mar 1 06:16:54 2019 +0100 1 45 51 1
T__ L__________ Tue Mar 27 19:28:02 2012 +0000 9 81 9 5
T__ V__ H_____ Fri Jan 11 15:32:06 2002 +0000 7 55 34 3
A________ D_________ Wed May 14 18:41:48 2003 +0000 3 66 16 1
L_____ V_____ Fri Aug 9 14:24:14 2019 +0200 8 71 2 2
M______ D_ B_________ Thu Jul 30 20:53:35 2020 +0200 10 30 29 3
J______ Tue Nov 20 22:02:20 2018 +0100 3 30 28 3
J_______ N_____ Tue Dec 15 22:03:18 2009 -0600 10 42 13 3
N___ F_______ Mon Sep 6 19:51:09 1993 +0000 7 36 18 7
K__ K______ Tue Oct 13 15:39:41 2020 -0700 5 26 28 2
E_____ S________ Mon Dec 10 15:18:37 2018 +0200 10 25 25 1
M_____ R____ Wed Nov 18 09:10:01 2020 +0100 10 26 13 2
J___ B_____ Mon Oct 2 20:04:58 2000 +0000 3 26 8 1
T_____ P________ Tue May 19 22:05:22 2020 +0200 4 29 2 1
A_____ B_______ Sun Mar 6 22:19:18 2011 -0500 6 24 2 2
N___ G_____ Tue Oct 27 06:12:27 2020 +0000 4 14 11 3
B____ K_____ Sat Feb 19 19:24:07 2011 -0500 6 22 2 2
A___ S______ Wed Oct 31 14:01:31 2018 +0000 1 12 9 1
S_____ T______ Mon Nov 24 15:27:49 2008 +0100 1 11 6 1
F______ K____ Tue May 14 00:25:23 2002 +0000 3 11 6 2
A______ H______ Fri Apr 29 04:08:35 2022 -0400 2 8 6 1
B____ H_____ Sat Dec 18 18:45:46 2021 +0100 4 3 9 2
T___ C_ M_____ Tue Nov 10 07:36:11 2020 +0100 2 8 3 1
E______ S_________ Fri Nov 4 11:50:32 2022 -0700 3 5 5 1
k_____ y Mon Nov 11 23:27:37 2019 +0900 3 3 3 3
S______ L________ Sat Jul 21 17:24:23 2012 +0200 3 3 3 2
A____ D_______ Sat Feb 15 10:49:14 2020 +0100 2 2 2 2
J_____ L_ Fri Aug 24 17:35:32 2018 +0000 1 1 1 1
D_____ H______ Wed Nov 29 01:26:22 1995 +0000 1 1 1 1
A______ S_____ Sat Sep 28 00:00:34 2013 +0200 1 1 1 1
A_____ R___ Mon Jun 14 21:54:40 2021 +0000 1 1 1 1
Code: #!/usr/bin/env txr
@(do
(defstruct author ()
name
e-mail
last-seen
(files 0)
(lines+ 0)
(lines- 0)
(commits 0))
(defvarl ah (hash)))
@(repeat)
Author: @name <@addr>
Date: @date
@(skip)
@files file@nil changed, @ins insertion@nil, @del deletion@nil
@ (set name @(flow name ;; anonymize name
(spl " ")
(map (op map (do if (plusp @2) #\_ @1) @1 0))
(join-with " ")))
@ (do (let ((a (or [ah name]
(new author
name name
e-mail addr
last-seen date))))
(inc a.commits)
(inc a.files (tointz files))
(inc a.lines+ (tointz ins))
(inc a.lines- (tointz del))
(set [ah name] a)))
@(end)
@(do
(flow (hash-values ah)
(csort @1 > [callf + .lines+ .lines-])
(cons (new author
name "NAME" last-seen "LAST-SEEN" files "FILES"
lines+ "LINES+" lines- "LINES-" commits "COMMITS"))
(each ((a @1))
(put-line `@{a.name 24} @{a.last-seen 30} @{a.files -5} \ \
@{a.lines+ -8} @{a.lines- -8} @{a.commits -7}`))))
> This requires that you have Go, Ruby, and the rake Ruby gem installed.
That doesn't cut it for me. git - once built - depends on C libraries and Perl. If you want to add something onto git (that is not specifically targeting Go, or Ruby etc.) - it should not IMNSHO depend on other things.
That doesn't mean you can't write your tool in some modern fashionable language, but eventually you need to bring it down to earth (or rather earth + Perl).