-- Update: just realized why it wouldn't make sense: `git push` would send only the delta from the previous commit and the previous commit is... non-existent (we only know it's ID), so we'd be back in square 1 (sending everything).
I started that comment as a reply to you but I realised that a) it may just have been a bug that might already be fixed and b) it looks like the Stack Overflow answer was speculative and not tested!
This is similar to the 'grafts' feature. Indeed 'git log' says 'grafted'.
You can test this using "git cat-file -p" with the commit that got retrieved, to print the raw object.
> git clone --depth 1 https://github.com/git/git > git log
commit 388218fac77d0405a5083cd4b4ee20f6694609c3 (grafted, HEAD -> master, origin/master, origin/HEAD) Author: Junio C Hamano <gitster@pobox.com> Date: Mon Feb 10 10:18:17 2025 -0800
The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
> git cat-file -p 388218fac77d0405a5083cd4b4ee20f6694609c3tree fc620998515e75437810cb1ba80e9b5173458d1c parent 50e1821529fd0a096fe03f137eab143b31e8ef55 author Junio C Hamano <gitster@pobox.com> 1739211497 -0800 committer Junio C Hamano <gitster@pobox.com> 1739211512 -0800
The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I can't reproduce the problem pushing to Bitbucket, using the most recent Git for Windows (2.47.1.windows.2). It only sent 3 objects (which would be the blob of the new file, the tree object containing the new file, and the commit object describing the tree), not the 6000+ in the repository I tested it on.
It may be that there was a bug that has now been fixed. Or it may be something that only happens/happened with GitHub (i.e. a bug at the receiving end, not the sending one!)
I note that the Stack Overflow user who wrote the answer left a comment underneath saying
"worth noting: I haven't tested this; it's just some simple applied math. One clone-and-push will tell you if I was right. :-)"
Except for performance, is there any downside to this?
In other words: When you store data in an application that only reads and writes data occasionally, is it a good idea to use the git approach and store it in files?
You can use a jj repo concurrently, e.g., over Dropbox with coworkers, and all it requires is a minor modification on top of the existing Git data model.
For example here on HN (which afaik also stores the data in files) you can change a comment you wrote. But that type of mutability does not call for transactions, right?
And does git not need crash durability?
Think maildir vs. mbox/PST/etc. for message storage. I stopped counting the number of times that I have seen Outlook mangle its message database and require a rebuild.
Generally it is not so popular, in part because of the OSes like Windows and macOS which have somewhat lacking filesystem implementations. Git also has performance issues with large repositories on Windows, which need to be worked around by various methods.
Transactions are another limitation as mentioned by the other reply (they are possible to implement on top of "normal" filesystems, but not in a portable way).
Anyone try it out yet?
(Not that I don't trust it, but I usually fetch the full history locally anyway)
I've tried with a new file both having content ('Test shallow clone push'), and again with an empty file. In both cases it pushed 3 objects, and in the empty file case it reused one (it turns out my repo already has some empty files in it).
It's always possible that this is (or was) a GitHub bug - I haven't tried it there.
I found this[2] very enlightening.
[1] https://github.blog/open-source/git/get-up-to-speed-with-par...
[2] https://www.howtogeek.com/devops/how-to-use-git-shallow-clon...
So the "where to attach it to the tree" info is effectively lost.
https://github.blog/open-source/git/get-up-to-speed-with-par...
When you have blobless/treeless clone on local, it will fetch missing blobs on-demand when doing random `git` operations, in the least expected moments, and it does this very inefficiently, one-by-one, which is super slow. (I also didn't find a way to go back from blobless/treeless clone to a normal clone, i.e. to force-fetch all missing blobs efficiently).
It's especially tricky when those `git` operations happen in background e.g. from a GUI (IDE extension etc.) and you don't have any feedback what is happening.
Some further reading:
- https://github.blog/open-source/git/get-up-to-speed-with-par...
- https://github.blog/open-source/git/git-clone-a-data-driven-...
From my side, when you have non-trivial sized repo, on local machine one should either use either a shallow, or a full clone (or, if possible, a sparse checkout, but this requires repo to be compatible).
There is a Moore’s law analogue buried there somewhere in how fast repos grow in relation to network and computing resources (an increase in which of course also makes repos grow faster)
git clone --depth=1 --filter tree:0