Hacking on PostgreSQL Is Hard
95 points
by gmac
16 days ago
| 8 comments
| rhaas.blogspot.com
| HN
mritchie712
13 days ago
[-]
The first commenter on the blog makes a very good point:

> The thing I like to remind people is that this project has been around for decades, during which over a hundred people (or whatever COUNT(*) FROM contributors is) have picked through the code trying to find low hanging fruit to contribute. Everything left to do is astonishingly hard; if it weren't it would have been done already.

There are no easy wins left to contribute.

I've had ideas for a few pg extensions, I wonder if these are as difficult to contribute.

reply
fforflo
13 days ago
[-]
Development-wise they're completely different. Extensions are much easier / more isolated. Developing a pg extension is all about compiling some C code into a .so object and dynamically using a templated Makefile to load it into the running Postgres process.

If you want some boilerplate to get you started I've been using this cookie-cutter template to bootstrap lots of extensions https://github.com/Florents-Tselai/cookiecutter-postgres-ext...

Demo walkthrough: https://www.youtube.com/watch?v=zVxY3ZmE5bU

Contributing to core Postgres, it's an entirely different story.

reply
eatonphil
13 days ago
[-]
> There are no easy wins left to contribute.

I don't think that's true. Watch the pgsql-docs mailing list and you will see a steady stream of people pointing issues in the docs.

Moreover, there is a steady stream of new features. New features that need polish and docs. I've got a patch in the mailing list (maybe, hopefully, knock on wood) to add the first code samples to demonstrate Table Access Methods.

For that matter, I don't think there has yet been committed an a second /dev/null or in-memory Table Access Method. Though I'm aware of one person's effort to add a /dev/null Table Access Method. But I don't think this has been merged yet.

This is unrelated to what Robert is talking about though. I don't love the contribution process. But there is definitely a lot to contribute to if you do spend time and keep an a eye out for where improvement might be useful/accepted.

reply
levkk
13 days ago
[-]
I'd say hacking isn't the same as building new production quality features.

Just for fun or for educational purposes, I'd want to change something about the database, and it won't be easy because internals aren't that well documented beyond code comments and C makes it hard to find what's what (no rust-style docs or code map).

Given months of digging around I'm sure I'll be able to do something small, but "hacking" is a side gig, not a full-time job.

reply
avi_vallarapu
13 days ago
[-]
I posted my thought on another thread too : https://news.ycombinator.com/item?id=40231332

- Postgres documentation is one of the well maintained database documentations. This also means that developers, committers ensure changes to documentations for every relevant patch.

- talk about bugs in postgres compared to MySQl or Oracle or etc databases. Bugs are comparatively lesser or generally rare even if you are supporting postgres services as a vendor with lots of customers. the reason is the efforts involved by a strong team of developers in not accepting anything and everything, there are strict best practices, reviews, discussions, tests, and a lot more that makes it difficult for a patch or a feature to make it to a release.

- ultimately, more easy is the acceptance of a patch, more the number of bugs.

I love Postgres the way it is today and it still is the dbms of the year and developers most loved database.

I wish we have more Contributors, committers, developers and also users and companies supporting Postgres so that the time to push a feature gets more faster and reasonably easier with more support.

reply
steve_rambo
13 days ago
[-]
Another discussion a couple of days ago:

https://news.ycombinator.com/item?id=40231332

reply
Ozzie_osman
13 days ago
[-]
I remember reading the entire thread/debate on whether to move to 64-bit XIDs. https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfH...

Over 150 replies, 2.5 years, and still no resolution. Which I guess could be good or bad, depending on how you look at it.

reply
tkiolp4
13 days ago
[-]
Do you think we are going to reach a point in which it’s very hard (perhaps impossible) to maintain big codebases (like postgres) due to the lack of maintainers? Would that mean the end of such projects? Or perhaps big corporations would take over at a high price (forking the project and publishing it under a price tag)?
reply
__s
13 days ago
[-]
Doesn't really make sense. The big tech companies pay many contributors as employees, allowing projects like postgres to offer a portable solution

Microsoft sells SQL Server with its proprietary features while also offering managed Postgres

Postgres development being hard because it's mature also means its indefinitely good since it's mature. A core slowly improving a solid codebase keeps stability

reply
fforflo
13 days ago
[-]
That's an ongoing discussion in the Postgres community [0], primarily because Postgres is 100% written in C.

As for the future, I don't know. Not many companies can maintain full forks. I've noticed, however, from my professional experience that businesses are willing to hire freelancers to code specific things. (e.g., I've had a few Postgres gigs to write custom Postgres extensions. 50% C and 50% SQL/PgSQL). But yeah, I guess lots of C projects will become like COBOL ones.

0: https://redmonk.com/jgovernor/2023/10/10/postgres-the-next-g...

reply
ozim
13 days ago
[-]
I see a different option. At some point someone will start a new db project that will take over.
reply
ghelmer
12 days ago
[-]
The barrier to entry for programmers on projects like PostgreSQL and FreeBSD is high, with good and bad results. It seems you have to be very committed to the project (which may involve support from one’s employer) to join and contribute. It requires deep understanding of the codebase and preparedness to deal with fallout when changes inevitably cause problems. That’s good in that the developers are highly invested in building a quality product. But the high barrier to entry makes it difficult to attract new developers, and it is very difficult for those with a passing interest to get fixes and improvements into the codebase.
reply
RoboTeddy
13 days ago
[-]
How’s PostgreSQL’s code quality? If projects have tons of technical debt or poor abstractions it can often be hard to make significant changes. Is that the case here, or no?
reply
convolvatron
13 days ago
[-]
eh...you know, if you're in the right parts its actually pretty pleasant. there is alot of good design in Postgres.

otoh, there is some awful legacy stuff, and some really awfully crosscutting stuff around physical logging (I just looked at the locking around running queries on a replica, and that's clearly never going to be correct)

despite the fact that its in C, given a couple major refactors that will never happen, it could be really nice

reply
anarazel
13 days ago
[-]
> I just looked at the locking around running queries on a replica, and that's clearly never going to be correct

Uh, huh. Details please?

reply
RedShift1
13 days ago
[-]
Hot take: this is good? It's a database, you want it to be predictable and reliable as it's literally the foundation of many applications and processes. It's not another toy project, mistakes can have disastrous consequences so you really want those extra layers of scrutiny.
reply
CodesInChaos
13 days ago
[-]
The main problem this post complains about is that it's difficult to implement changes correctly, even as an expert. That's definitely not a good thing.
reply
pylua
13 days ago
[-]
I read the article and I can’t understand at a simple level why it’s hard to contribute. Is it easy to break things? Is it hard to determine if something is broken ? If so, maybe it just wasn’t really designed with those items in mind, or maybe there just isn’t adequate testing.

Adding on a new feature that is hard to understand is maybe a signal that the design does not support it.

reply
RedShift1
13 days ago
[-]
Yes but it's a database server, I expect lots of things to be non-trivial here?
reply
layer8
13 days ago
[-]
Sure, but the design should make it reasonably efficient and reliable to reason about the behavior. The assumptions, requirements, and guarantees should be clear (documented) at all relevant code locations.

There is a lot of mentioning of testing in this thread and in TFA. However, testing is not the main tool to achieve correctness. Testing is merely a sanity check. Correctness is achieved by being able to adequately reason about the logic of the code, making sure that all assumptions and guarantees are met.

TFA sounds like that reasoning is hard in the PostgreSQL code base. But it could also be that the author was approaching things in a too cavalier manner.

reply
Kinrany
13 days ago
[-]
Expected doesn't mean good though.
reply
amlib
13 days ago
[-]
Sure, but is there any other mature and complex (or even more complex) database software easier to hack then Postgres? I imagine something like Microsoft SQL is just as hard and Oracle is known to be a complete mess, much worse than Postgres in this regard.
reply