FilterHN

Show HN: Pgit – A Git-like CLI backed by PostgreSQL

84 points

by ImGajeed76

1 day ago

| past

| 11 comments

| oseifert.ch

| HN

▲

smartmic

6 hours ago

[-]

Of course, we can’t leave out a mention of Fossil here — the SCM system built by and for SQLite.

https://fossil-scm.org/

▲

ndegruchy

1 hour ago

[-]

Fossil is great. Not only is it a full suite of tools associated with the repository (discussions, tickets, wiki) but the tool is a single >10mb binary and can run as a web server (or CGI-like interface) for remote hosting.

▲

thunderbong

5 hours ago

[-]

And fossil itself is an SQLite database!

▲

Pay08

4 hours ago

[-]

How much does it take advantage of being a DB underneath?

▲

ImGajeed76

6 hours ago

[-]

yeah fossil is great, but can fossil import the linux kernel (already working on the next post)

▲

aljgz

6 hours ago

[-]

Still halfway through reading, but what you've made can unlock a lot of use cases.

> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow

For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.

That said, for the specific use case I have in mind, postgres is perfectly fine

▲

hrmtst93837

3 hours ago

[-]

SQLite is fine right up until you want concurrent writers. Once you need multiple users, cross-host access, or anything that looks like shared infra instead of a local cache, the file-locking model stops being cute and starts setting the rules for the whole design. For collaborative versioning, Postgres makes more sense.

▲

brigandish

2 hours ago

[-]

For a distributed VCS, what would be the need for such things? Even if it were a really big project, how many writes could be going on that this becomes a bottleneck? I don't see it but maybe you have a situation in mind.

▲

lelanthran

1 hour ago

[-]

In the current environment, even a distributed VCS may have concurrent agents modifying it on different branches.

▲

ImGajeed76

2 hours ago

[-]

The problem i faced is mostly importing large repos. But normal use should be fine.

▲

nasretdinov

4 hours ago

[-]

Also SQLite in WAL/WAL2 mode is definitely not amy slower for writing than Postgres either.

▲

ImGajeed76

6 hours ago

[-]

sounds great yes. maybe an SQLite version will come in the future

▲

dmonterocrespo

43 minutes ago

[-]

What would be the general purpose of storing the history in a remote database? Is it for use by agents? It's not the same as agents cloning the project and running "git log".

▲

ImGajeed76

37 minutes ago

[-]

1) In the case of pgit, the "remote" database is a local docker container

2) You can do more complex analyses faster and easier (you don't need to pipe the git outputs) since it's just SQL

but pgit is not meant to replace git.

▲

aljgz

3 hours ago

[-]

How well does this support random-access queries to the file names and content at a certain revision? Like:

- "Checking out" a specific branch (which can be reasonably slow)

- Query all files and folders in path `/src`

- Query all files and folders in path `/src/*` (and maybe with extra pattern matches)

- Be able to read contents of a file from a certain offset for a certain length

These are similar to file system queries to a working directory

▲

ImGajeed76

2 hours ago

[-]

Accessing specific files is very fast. For sure sub second and most of the times its just a few milliseconds

▲

lmuscat

1 hour ago

[-]

Would be cool to populate the DB and keep it in sync by pointing to postgres as an upstream remote inside of git itself. That would probably require a custom postgres extension and a way to accept traffic from git.

▲

Terretta

2 hours ago

[-]

Why a custom LLM prompt for what appears to be the default 'report' you'd want? Wouldn't the CLI just do this for a report command?

Is there an example of the tool enabling LLM 'discovering' something non-deterministic and surprising?

▲

ImGajeed76

2 hours ago

[-]

Yes, you also got analysis commands the AI can use. I just did the prompt example before they existed.

▲

Fire-Dragon-DoL

5 hours ago

[-]

Wouldn't duckdb be better suited for this? Forgive the stupid question. I just connected "csv as sql" to "git as sql" and duckdb comes to mind

▲

ImGajeed76

5 hours ago

[-]

I did actually look into writing the extension for duckdb. But similar to SQLite the extension possibilities are not great for what I needed. Though duckdb is a great database.

▲

Pay08

4 hours ago

[-]

This is incredibly neat and might actually become a part of my toolbox.

▲

ImGajeed76

3 hours ago

[-]

thanks! but it might still need some releases until it's really good. just don't rely on it ;)

▲

killingtime74

6 hours ago

[-]

I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.

Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?

▲

ImGajeed76

6 hours ago

[-]

so most analyses already have a CLI function you can just call with parameters. for those that don't, in my case, the agent just looked at the --help of the commands and was able to perform the queries.

▲

Toby11

3 hours ago

[-]

why do agents need to know these metas about git history to perform its coding functions though?

even humans don’t do this unless there’s a crazy bug causing them to search around every possible angles.

that said, this sound like a great and fun project to work on.

▲

nsonha

4 minutes ago

[-]

debuging and operational investigations. I would say half of my sessions with agent involves those

▲

ImGajeed76

3 hours ago

[-]

but the difference between you and an agent is that you naturally know the history of the project if you have worked on it. the AI doesnt.

▲

tomhallett

59 minutes ago

[-]

so true!

1) commit messages often capture the "why" something changed - versus the code/tests which focus on the what/how for right now.

2) when you have a regression being able to see the code before it was introduced and the code which was changed at the same time is very helpful in understanding the developer's intent, blindspots in their approach, etc.

▲

Zardoz84

6 hours ago

[-]

Interesting... could be used to store multiple git repos and do a full text search across the multiple repos ?

▲

ImGajeed76

6 hours ago

[-]

in theory yes. you just need to do the full text search across the databases. pgit doesnt support it but at the end its just postgres under the hood.