FilterHN

I made my own Git

222 points

by TonyStr

6 hours ago

| past

| 25 comments

| tonystr.net

| HN

▲

nasretdinov

3 hours ago

[-]

Nice work! On a complete tangent, Git is the only SCM known to me that supports recursive merge strategy [1] (instead of the regular 3-way merge), which essentially always remembers resolved conflicts without you needing to do anything. This is a very underrated feature of Git and somehow people still manage to choose rebase over it. If you ever get to implementing merges, please make sure you have a mechanism for remembering the conflict resolution history :).

[1] https://stackoverflow.com/questions/55998614/merge-made-by-r...

▲

arunix

2 hours ago

[-]

I remember in a previous job having to enable git rerere, otherwise it wouldn't remember previously resolved conflicts.

https://git-scm.com/book/en/v2/Git-Tools-Rerere

▲

direwolf20

9 minutes ago

[-]

The recursive merge is about merging branches that already have merges in them, while rerere is about repeating the same merge several times.

▲

nasretdinov

2 hours ago

[-]

I believe rerere is a local cache, so you'd still have to resolve the conflicts again on another machine. The recursive merge doesn't have this issue — the conflict resolution inside the merge commits is effectively remembered (although due to how Git operates it actually never even considers it a conflict to be remembered — just a snapshot of the closest state to the merged branches)

▲

mkleczek

2 hours ago

[-]

Much more principled (and hence less of a foot-gun) way of handling conflicts is making them first class objects in the repository, like https://pijul.org does.

▲

jcgl

2 hours ago

[-]

Jujutsu too[0]:

> Jujutsu keeps track of conflicts as first-class objects in its model; they are first-class in the same way commits are, while alternatives like Git simply think of conflicts as textual diffs. While not as rigorous as systems like Darcs (which is based on a formalized theory of patches, as opposed to snapshots), the effect is that many forms of conflict resolution can be performed and propagated automatically.

[0] https://github.com/jj-vcs/jj

▲

p0w3n3d

1 hour ago

[-]

That's something new to me (using git for 10 years, always rebased)

▲

teiferer

4 hours ago

[-]

If you ever wonder how coding agents know how to plan things etc, this is the kind of article they get this training from.

Ends up being circular if the author used LLM help for this writeup though there are no obvious signs of that.

▲

TonyStr

3 hours ago

[-]

Interestingly, I looked at github insights and found that this repo had 49 clones, and 28 unique cloners, before I published this article. I definitely did not clone it 49 times, and certainly not with 28 unique users. It's unlikely that the handful of friends who follow me on github all cloned the repo. So I can only speculate that there are bots scraping new public github repos and training on everything.

Maybe that's obvious to most people, but it was a bit surprising to see it myself. It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.

The article doesn't contain any LLM output. I use LLMs to ask for advice on coding conventions (especially in rust, since I'm bad at it), and sometimes as part of research (zstd was suggested by chatgpt along with comparisons to similar algorithms).

▲

Phelinofist

3 hours ago

[-]

I selfhost Gitea. The instance is crawled by AI crawlers (checked the IPs). They never cloned, they just browse and take it directly from there.

▲

Phelinofist

43 minutes ago

[-]

For reference, this is how I do it in my Caddyfile:

   (block_ai) {
       @ai_bots {
           header_regexp User-Agent (?i)(anthropic-ai|ClaudeBot|Claude-Web|Claude-SearchBot|GPTBot|ChatGPT-User|Google-Extended|CCBot|PerplexityBot|ImagesiftBot)
       }

       abort @ai_bots
   }

Then, in a specific app block include it via

   import block_ai

▲

Zambyte

2 hours ago

[-]

i run a cgit server on an r720 in my apartment with my code on it and that puppy screams whenever sam wants his code

blocking openai ips did wonders for the ambient noise levels in my apartment. they're not the only ones obviously, but they're they only ones i had to block to stay sane

▲

MarsIronPI

2 hours ago

[-]

Have you considered putting it behind Anubis or an equivalent?

▲

Zambyte

1 hour ago

[-]

Yes, but I haven't and would prefer not to

▲

nerdponx

3 hours ago

[-]

Time to start including deliberate bugs. The correct version is in a private repository.

▲

teiferer

2 hours ago

[-]

And what purpose would this serve, exactly?

▲

adastra22

1 hour ago

[-]

Spite.

▲

program_whiz

1 hour ago

[-]

while I think this is a fun idea -- we are in such a dystopian timeline that I fear you will end up being prosecuted under a digital equivalent of various laws like "why did you attack the intruder instead of fleeing" or "you can't simply remove a squatter because its your house, therefore you get an assault charge."

A kind of "they found this code, therefore you have a duty not to poison their model as they take it." Meanwhile if I scrape a website and discover data I'm not supposed to see (e.g. bank details being publicly visible) then I will go to jail for pointing it out. :(

▲

tonnydourado

2 hours ago

[-]

Particularly on GitHub, might not even be LLMs, just regular bots looking for committed secrets (AWS keypairs, passwords, etc.)

▲

teiferer

2 hours ago

[-]

I don't really get why they need to clone in order to scrape ...?

> It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.

That's very much expected. That's why the quality of LLM coding agents is like it is. (No offense.)

The "asking LLMs for advice" part is where the circular aspect starts to come into the picture. Not worse than looking at StackOverflow though which then links to other people who in turn turned to StackOverflow for advice.

▲

adastra22

1 hour ago

[-]

The quality of LLM coding agents is pretty good now.

▲

0x696C6961

2 hours ago

[-]

This has been happening before LLMs too.

▲

wasmainiac

4 hours ago

[-]

Maybe we can poison LLMs with loops of 2 or more self referencing blogs.

▲

jdiff

4 hours ago

[-]

Only need one, they're not thinking critically about the media they consume during training.

▲

falcor84

4 hours ago

[-]

Here's a sad prediction: over the coming few years, AIs will get significantly better at critical evaluation of sources, while humans will get even worse at it.

▲

whstl

3 hours ago

[-]

I wish I could disagree with you, but what I'm seeing on average (especially at work) is exactly that: people asking stuff to ChatGPT and accepting hallucinations as fact, and then fighting me when I say it's not true.

▲

prmoustache

3 hours ago

[-]

There is "death by GPS" for people dying after blindly following their GPS instruction. There will definitely be a "death by AI" expression very soon.

▲

stevekemp

50 minutes ago

[-]

Tesla-related fatalities probably count already, albeit without that label/name.

▲

topaz0

3 hours ago

[-]

My sad prediction is that LLMs and humans will both get worse. Humans might get worse faster though.

▲

sailfast

1 hour ago

[-]

Hot take: Humans have always been bad at this (in the aggregate, without training). Only a certain percentage of the population took the time to investigate.

For most throughout history, whatever is presented to you that you believe is the right answer. AI just brings them source information faster so what you're seeing is mostly just the usual behavior, but faster. Before AI people would not have bothered to try and figure out an answer to some of these questions. It would've been too much work.

▲

keybored

30 minutes ago

[-]

HN commenters will be technooptimistic misanthrops. Status quo ante bellum.

▲

andy_ppp

3 hours ago

[-]

The secret sauce about having good understanding, taste and style (both for coding and writing) has always been in the fine tuning and RHLF steps. I'd be skeptical if the signals a few GitHub repos or blogs generate at the initial stages of the learning are that critical. There's probably a filter also for good taste on the initial training set and these are so large not even a single full epoch is done on the data these days.

▲

anu7df

4 hours ago

[-]

I understand model output put back into training would be an issue, but if model output is guided by multiple prompts and edited by the author to his/her liking wouldn't that at least be marginally useful?

▲

prodigycorp

3 hours ago

[-]

Random aside about training data:

One of the funniest things I've started to notice from Gemini in particular is that in random situations, it talks with english with an agreeable affect that I can only describe as.. Indian? I've never noticed such a thing leak through before. There must be a ton of people in India who are generating new datasets for training.

▲

evntdrvn

1 hour ago

[-]

There was a really great article or blog post published in the last few months about the author's very personal experience whose gist was "People complain that I sound/write like an LLM, but it's actually the inverse because I grew up in X where people are taught formal English to sound educated/western, and those areas are now heavily used for LLM training."

I wish I could find it again, if someone else knows the link please post it!

▲

gxnxcxcx

55 minutes ago

[-]

I'm Kenyan. I don't write like ChatGPT, ChatGPT writes like me

https://news.ycombinator.com/item?id=46273466

▲

blenderob

3 hours ago

[-]

That's very interesting. Any examples you can share which has those agreeable effects?

▲

prodigycorp

2 hours ago

[-]

I'm going to do a cursory look through my antigrav history, i want to find it too. I remember it's primarily in the exclamations of agreement/revelation, and one time expressing concern which I remember were slightly off natural for an american english speaker.

▲

mexicocitinluez

4 hours ago

[-]

> Ends up being circular if the author used LLM help for this writeup though there are no obvious signs of that.

Great argument for not using AI-assisted tools to write blog posts (especially if you DO use these tools). I wonder how much we're taking for granted in these early phases before it starts to eat itself.

▲

darkryder

4 hours ago

[-]

Great writeup! It's always fun to learn the details of the tools we use daily.

For others, I highly recommend Git from the Bottom Up[1]. It is a very well-written piece on internal data structures and does a great job of demystifying the opaque git commands that most beginners blindly follow. Best thing you'll learn in 20ish minutes.

1. https://jwiegley.github.io/git-from-the-bottom-up/

▲

MarsIronPI

2 hours ago

[-]

Oh, I hadn't ever seen that one. I "grokked" Git thanks to The Git Parable[0] several years ago.

[0]: https://tom.preston-werner.com/2009/05/19/the-git-parable

▲

sanufar

40 minutes ago

[-]

Ooh, this looks fun! I didn’t know you could cat-file on a hash id, that’s actually quite cool.

▲

spuz

3 hours ago

[-]

Thanks - I think this is the article I was thinking of that really helped me to understand git when I first started using it back in the day. I tried to find it again and couldn't.

▲

brendoncarroll

15 minutes ago

[-]

Me too. Version control is great, it should get more use outside of software.

https://github.com/gotvc/got

Notable differences: E2E encryption, parallel imports (Got will light up all your cores), and a data structure that supports large files and directories.

▲

direwolf20

10 minutes ago

[-]

Cool. When you reimplement something, it forces you to see the fractal complexity of it.

▲

p4bl0

3 hours ago

[-]

Nice post :). It made me think of ugit: DIY Git in Python [1] which is still by far my favorite of this kind of posts. It really goes deep into Git internals while managing to stay easy to follow along the way.

[1] https://www.leshenko.net/p/ugit/

▲

mfashby

1 hour ago

[-]

in a similar vein; Write yourself a Git was fun to follow https://wyag.thb.lt/

▲

TonyStr

3 hours ago

[-]

This page is beautiful!

Bookmarked for later

▲

UltraSane

1 hour ago

[-]

I mapped git operations to Neo4j and it really helped me understand how it works.

▲

oldestofsports

1 hour ago

[-]

Nice job, great article!

I had a go at it as well a while back, I call it "shit" https://github.com/emanueldonalds/shit

▲

tpoacher

47 minutes ago

[-]

THE shit, in fact.

▲

astinashler

46 minutes ago

[-]

Does this git include empty folder? I always annoy that it's not track empty folder.

▲

TonyStr

33 minutes ago

[-]

yep! Had to check to be sure:

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/tvc decompress f854e0b307caf47dee5c09c34641c41b8d5135461fcb26096af030f80d23b0e5`

=== args === decompress f854e0b307caf47dee5c09c34641c41b8d5135461fcb26096af030f80d23b0e5 === tvcignore === ./target ./.git ./.tvc

=== subcommand === decompress ------------------ tree ./src/empty-folder e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 blob ./src/main.rs fdc4ccaa3a6dcc0d5451f8e5ca8aeac0f5a6566fe32e76125d627af4edf2db97

▲

igorw

4 hours ago

[-]

Random but y'all might enjoy. Git client in PHP, supports reading packfiles, reftables, diff via LCS. Written by hand.

https://github.com/igorwwwwwwwwwwwwwwwwwwww/gipht-horse

▲

nasretdinov

3 hours ago

[-]

Nice! This repo is a huge W for PHP I'd say.

P.S. Didn't know that plain '@' can be used instead of HEAD, but I guess it makes sense since you can omit both left and right parts of the expressions separated by '@'

▲

sluongng

4 hours ago

[-]

Zstd dictionary compression is essentially how Meta's Mercurial fork (Sapling VCS) stores blobs https://sapling-scm.com/docs/dev/internals/zstdelta. The source code is available in GitHub if folks want to study the tradeoffs vs git delta-compressed packfiles.

I think theoratically, Git delta-compression is still a lot more optimized for smaller repos. But for bigger repos where sharding storaged is required, path-based delta dictionary compression does much better. Git recently (in the last 1 year) got something called "path-walk" which is fairly similar though.

▲

sneela

4 hours ago

[-]

> If you want to look at the code, it's available on github.

Why not tvc-hub :P

Jokes aside, great write up!

▲

TonyStr

4 hours ago

[-]

haha, maybe that's the next project. It did feel weird to make git commits at the same time as I was making tvc commits

▲

h1fra

4 hours ago

[-]

Learning git internals was definitely the moment it became clear to me how efficient and smart git is.

And this way of versionning can be reused in other fields, as soon as have some kind of graph of data that can be modified independently but read all together then it makes sense.

▲

eru

3 hours ago

[-]

> These objects are also compressed to save space, so writing to and reading from .git/objects/ will always involve running a compression algoritm. Git uses zlib to compress objects, but looking at competitors, zstd seemed more promising:

That's a weird thing to put so close to the start. Compression is about the least interesting aspect of Git's design.

▲

alphabetag675

3 hours ago

[-]

When you are learning, everything is important. I think it is okay to cut the person some slack regarding this.

▲

kgeist

5 hours ago

[-]

>The hardest part about this project was actually just parsing.

How about using sqlite for this? Then you wouldn't need to parse anything, just read/update tables. Fast indexing out of the box, too.

▲

grenran

5 hours ago

[-]

that would be what https://fossil-scm.org/ is

▲

dchest

17 minutes ago

[-]

While Fossil uses SQLite for underlying storage (instead of filesystem directly) and various support infrastructure, its actual format is not based on SQLite: https://fossil-scm.org/home/doc/trunk/www/fileformat.wiki

It's basically plaintext. Even deltas are plaintext for text files.

Reason: "The global state of a fossil repository is kept simple so that it can endure in useful form for decades or centuries. A fossil repository is intended to be readable, searchable, and extensible by people not yet born."

▲

TonyStr

4 hours ago

[-]

Very interesting. Looks like fossil has made some unique design choices that differ from git[0]. Has anyone here used it? I'd love to hear how it compares.

[0] https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

▲

smartmic

4 hours ago

[-]

I use Fossil extensively, but only for personal projects. There are specific design conditions, such as no rebasing [0], and overall, it is simpler yet more useful to me. However, I think Fossil is better suited for projects governed under the cathedral model than the bazaar model. It's great for self-hosting, and the web UI is excellent not only for version control, but also for managing a software development project. However, if you want a low barrier to integrating contributions, Fossil is not as good as the various Git forges out there. You have to either receive patches or Fossil bundles via email or forum, or onboard/register contributors as developers with quite wide repo permissions.

[0]: https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md

▲

toyg

3 hours ago

[-]

Sounds like a more modern cvs/Subversion

▲

jact

2 hours ago

[-]

I use Fossil extensively for all my personal projects and find it superior for the general case. As others said it’s more suited for small projects.

I also use Fossil for lots of weird things. I created a forum game using Fossil’s ticket and forum features because it’s so easy to spin up and for my friends to sign in to.

At work we ended up using Fossil in production to manage configuration and deployment in a highly locked down customer environment where its ability to run as a single static binary, talk over HTTP without external dependencies, etc. was essential. It was a poor man’s deployment tool, but it performed admirably.

Fossil even works well as a blogging platform.

▲

embedding-shape

4 hours ago

[-]

Used it on and off mainly to check it out, but always in a personal/experimental capacity. Never managed to convince any teams to give it a try, mostly because git don't tend to get in the way, so hard to justify to learn something completely new.

I really enjoy how local-first it is, as someone who sometimes work without internet connection. That the data around "work" is part of the SCM as well, not just the code, makes a lot of sense to me at a high-level, and many times I wish git worked the same...

▲

usrbinbash

4 hours ago

[-]

I mean, git is just as "local-first" (a git repo is just a directory after all), and the standard git-toolchain includes a server, so...

But yeah, fossil is interesting, and it's a crying shame its not more well known, for the exact reasons you point out.

▲

embedding-shape

4 hours ago

[-]

> I mean, git is just as "local-first" (a git repo is just a directory after all), and the standard git-toolchain includes a server, so...

It isn't though, Fossil integrates all the data around the code too in the "repository", so issues, wiki, documentation, notes and so on are all together, not like in git where most commonly you have those things on another platform, or you use something like `git notes` which has maybe 10% of the features of the respective Fossil feature.

It might be useful to scan through the list of features of Fossil and dig into it, because it does a lot more than you seem to think :) https://fossil-scm.org/home/doc/trunk/www/index.wiki

▲

adastra22

1 hour ago

[-]

Those things exist for git too, e.g. git-bug. But the first-class to do it in git is email.

▲

graemep

3 hours ago

[-]

I like it but the problem is everyone else already knows git and everything integrates with git.

It is very easy to self host.

Not having staging is awkward at first but works well once you get used to it.

I prefer it for personal projects. In think its better for small teams if people are willing to adjust but have not had enough opportunities to try it.

▲

TonyStr

2 hours ago

[-]

Is it possible to commit individual files, or specific lines, without a staging area? I guess this might be against Fossil's ethos, and you're supposed to just commit everything every time?

▲

graemep

26 minutes ago

[-]

Yes you can list specific files, but you have to list them all in the commit command.

I think the ethos is to discourage it.

It does not seem to be possible to commit just specific lines.

▲

jact

2 hours ago

[-]

You can commit individual files.

▲

justabrowser

2 hours ago

[-]

Fossil user here. Yes, you can commit individual files.

There are certain more advanced things git can do that fossil can't--for example rebasing, which the author refuses to implement. If you want to rename a commit then you also have to go into the sqlite shell and do it manually, and there's no way to delete a bad commit. All of this is stupid, obnoxious asshattery on the part of the author, but otherwise it's very easy to use and bulletproof. (I don't ever use the web UI.)

One way to simulate "staging" is to just checkout the repo into multiple directories and do different things in each one, or even create a temporary work repo to be cleaned up and merged into the main one.

So you do lose some flexibility with fossil, but for normal uses it's quite usable, and the tradeoff is you won't ever accidentally blow your leg off.

EDIT: Since literally every other comment I made after this (like 3 of them) was downvoted and insta-flagged by the brigade of useless fags here, this will be my last post under this account. Fuck this piece of shit forum.

▲

adzm

1 hour ago

[-]

Second time today I've read and agreed with most of your comment only to eyeroll and downvote once seeing your ridiculous and immature edit.

▲

storystarling

1 hour ago

[-]

SQLite solves the storage layer but I suspect you run into a pretty big impedance mismatch on the graph traversals. For heavy DAG operations like history rewriting, a custom structure seems way more efficient than trying to model that relationally.

▲

mg794613

2 hours ago

[-]

"Though I suck at it, my go-to language for side-projects is always Rust"

Hmm, dont be so hard on yourself!

proceeds to call ls from rust

Ok nevermind, although I dont think rust is the issue here.

(Tony I'm joking, thanks for the article)

▲

athrowaway3z

2 hours ago

[-]

I do wonder if the compression step makes sense at this layer instead of the filesystem layer.

▲

heckelson

4 hours ago

[-]

gentle reminder to set your website's `<title>` to something descriptive :)

▲

TonyStr

4 hours ago

[-]

haha, thank you. Added now :-)

▲

jrockway

2 hours ago

[-]

sha256 is a very slow algorithm, even with hardware acceleration. BLAKE3 would probably make a noticeable performance difference.

Some reading from 2021: https://jolynch.github.io/posts/use_fast_data_algorithms/

It is really hard to describe how slow sha256 is. Go sha256 some big files. Do you think it's disk IO that's making it take so long? It's not, you have a super fast SSD. It's sha256 that's slow.

▲

EdSchouten

2 hours ago

[-]

It depends on the architecture. On ARM64, SHA-256 tends to be faster than BLAKE3. The reasons being that most modern ARM64 CPUs have native SHA-256 instructions, and lack an equivalent of AVX-512.

Furthermore, if your input files are large enough that parallelizing across multiple cores makes sense, then it's generally better to change your data model to eliminate the existence of the large inputs altogether.

For example, Git is somewhat primitive in that every file is a single object. In retrospect it would have been smarter to decompose large files into chunks using a Content Defined Chunking (CDC) algorithm, and model large files as a manifest of chunks. That way you get better deduplication. The resulting chunks can then be hashed in parallel, using a single-threaded algorithm.

▲

grumbelbart2

2 hours ago

[-]

Is that even when using the SHA256 hardware extensions? https://en.wikipedia.org/wiki/SHA_instruction_set

▲

sublinear

2 hours ago

[-]

> If I were to do this again, I would probably use a well-defined language like yaml or json to store object information.

I know this is only meant to be an educational project, but please avoid yaml (especially for anything generated). It may be a superset of json, but that should strongly suggest that json is enough.

I am aware I'm making a decade old complaint now, but we already have such an absurd mess with every tool that decided to prefer yaml (docker/k8s, swagger, etc.) and it never got any better. Let's not make that mistake again.

People just learned to cope or avoid yaml where they can, and luckily these are such widely used tools that we have plenty of boilerplate examples to cheat from. A new tool lacking docs or examples that only accepts yaml would be anywhere from mildly frustrating to borderline unusable.

▲

ofou

2 hours ago

[-]

btw, you can change the hashing algorithm in git easily

▲

smangold

2 hours ago

[-]

Tony nice work!

▲

prakhar1144

4 hours ago

[-]

I was also playing around with the ".git" directory - ended up writing:

"What's inside .git ?" - https://prakharpratyush.com/blog/7/

▲

b1temy

2 hours ago

[-]

Nice work, it's always interesting to see how one would design their own VCS from scratch, and see if they fall into problems existing implementations fell into in the past and if the same solution was naturally reached.

The `tvc ls` command seems to always recompute the hash for every non-ignored file in the directory and its children. Based on the description in the blog post, it seems the same/similar thing is happening during commits as well. I imagine such an operation would become expensive in a giant monorepo with many many files, and perhaps a few large binary files thrown in.

I'm not sure how git handles it (if it even does, but I'm sure it must). Perhaps it caches the hash somewhere in the `.git`directory, and only updates it if it senses the file hash changed (Hm... If it can't detect this by re-hashing the file and comparing it with a known value, perhaps by the timestamp the file was last edited?).

> Git uses SHA-1, which is an old and cryptographically broken algorithm. This doesn't actually matter to me though, since I'll only be using hashes to identify files by their content; not to protect any secrets

This _should_ matter to you in any case, even if it is "just to identify files". If hash collisions (See: SHAttered, dating back to 2017) were to occur, an attacker could, for example, have two scripts uploaded in a repository, one a clean benign script, and another malicious script with the same hash, perhaps hidden away in some deeply nested directory, and a user pulling the script might see the benign script but actually pull in the malicious script. In practice, I don't think this attack has ever happened in git, even with SHA-1. Interestingly, it seems that git itself is considering switching to SHA-256 as of a few months ago https://lwn.net/Articles/1042172/

I've not personally heard of the process of hashing to also be known as digesting, though I don't doubt that it is the case. I've mostly familiar of the resulting hash being referred to as the message digest. Perhaps it's to differentiate between the verb 'hash' (the process of hashing) with the output 'hash' (the ` result of hashing). And naming the function `sha256::try_digest`makes it more explicit that it is returning the hash/digest. But it is a bit of a reach, perhaps that are just synonyms to be used interchangeably as you said.

On a tangent, why were TOML files not considered at the end? I've no skin in the game and don't really mind either way, but I'm just curious since I often see Rust developers gravitate to that over YAML or JSON, presumably because it is what Cargo uses for its manifest.

Also, obligatory mention of jujutsu/jj since it seems to always be mentioned when talking of a VCS in HN.

▲

TonyStr

1 hour ago

[-]

You are completely right about tvc ls recomputing each hash, but I think it has to do this? A timestamp wouldn't be reliable, so the only reliable way to verify a file's contents would be to generate a hash.

In my lazy implemenation, I don't even check if the hashes match, the program reads, compresses and tries to write the unchanged files. This is an obvious area to improve performance on. I've noticed that git speeds up object lookups by generating two-letter directories from the first two letters in hashes, so objects aren't actually stored as `.git/objects/asdf12ha89k9fhs98...`, but as `.git/objects/as/df12ha89k9fhs98...`.

>why were TOML files not considered at the end I'm just not that familiar with toml. Maybe that would be a better choice! I saw another commenter who complained about yaml. Though I would argue that the choice doesn't really matter to the user, since you would never actually write a commit object or a tree object by hand. These files are generated by git (or tvc), and only ever read by git/tvc. When you run `git cat-file <hash>`, you'll have to add the `-p` flag (--pretty) to render it in a human-readable format, and at that point it's just a matter of taste whether it's shown in yaml/toml/json/xml/special format.

▲

quijoteuniv

1 hour ago

[-]

Now … if you reinvent Linux you are closer to be compared to LT

▲

holoduke

2 hours ago

[-]

I wonder if in the near future there will be no tools anymore in the sense we know it. you will maybe describe the tool you need and its created on the fly.