Show HN: Pg_replicate – Build Postgres replication applications in Rust
216 points
29 days ago
| 13 comments
| github.com
| HN
imor80
29 days ago
[-]
Hey HN,

For the past few months, as part of my job at Supabase, I have been working on pg_replicate. pg_replicate lets you very easily build applications which can copy data (full table copies and cdc) from Postgres to any other data system. Around six months back I was figuring out what can be built by tailing Postgres' WAL. pg_replicate grew organically out of that effort. Many similar tools, like Debezium, exist already which do a great job, but pg_replicate is much simpler and focussed only on Postgres. Rust was used in the project because I am most comfortable with it. pg_replicate abstracts over the Postgres logical replication protocol[0] and lets you work with higher level concepts. There are three main concepts to understand pg_replicate: source, sink and pipeline.

1/ A source is a Postgres db from which data is to be copied. 2/ A sink is a data system into which data will be copied. 3/ A pipeline connects a source to a sink.

Currently pg_replicate supports BigQuery, DuckDb local file and, MotherDuck as sinks. More sinks will be added in future. To support a new data system, you just need to implement the BatchSink trait (older Sink trait will be deprecated soon).

pg_replicate is still under heavy development and is a little thin on documentation. Performance is another area which hasn't received much attention. We are releasing this to get feedback from the community and are still evaluating how (or if) we can integrate it with the Supabase platform. Comments and feedback are welcome.

[0] Postgres logical replication protocol: [https://www.postgresql.org/docs/current/protocol-logical-rep...)

reply
convolvatron
29 days ago
[-]
I've recently been playing with the logical replication protocol, and it enables all kinds of interesting usages. one really cool thing is that you see the transactional boundaries, so not only can you write a cache, you can do so in a way thats internally consistent.

its also inherently much nicer than listen/notify, since you don't have to go back and figure out what data was associated with the event

reply
imor80
28 days ago
[-]
Yes, logical replication complements very nicely the normal way of interacting with the database via queries. This inverted flow makes those apps possible which were hard/impossible to build with just queries.
reply
necubi
28 days ago
[-]
This is so cool! I appreciate Debezium for its wide DB support (no part of me wants to know the inner workings of MSSQL's replication protocol) but it's finnicky to run. Great to have an alternative, at least for postgres.
reply
chucky_z
28 days ago
[-]
I’m curious in your experience how many clients can run pg_replicate at once?

With MySQL I saw the interesting use-case of the black hole storage engine to scale out replication logs but ultimately the only usage I’m aware of was for scaling other mysql read replicas.

The idea of scaling an application by tailing logs from a database sounds very interesting to me, and I’m curious if you’ve explored this at all. There’s of course things like Kafka (and then things like Debezium), but it’s hard to beat direct!

reply
imor80
28 days ago
[-]
> I’m curious in your experience how many clients can run pg_replicate at once?

I'd expect more clients to put more pressure on the resource usage on the db. But it's not clear whether the relationship between number of clients and resource usage is linear, quadratic or something else since I haven't done benchmarking yet.

> The idea of scaling an application by tailing logs from a database sounds very interesting to me, and I’m curious if you’ve explored this at all. There’s of course things like Kafka (and then things like Debezium), but it’s hard to beat direct!

It doesn't exist yet, but I was thinking of creating a sink which exposes Postgres WAL via websockets. This way the number of clients might scale much better.

reply
phamilton
29 days ago
[-]
Postgres + Rust is one of the most exciting intersections of tech I've seen in a while.

There's external tooling like his project, but postgres extensions in Rust are exciting.

Full extensions via pgrx have been cool to see, but plrust + pg_tle is also starting to show up.

If you aren't familiar with TLE (Trusted Language Extensions), it is a postgres extension from AWS that created some privileged interfaces for procedural languages (used for user-defined functions) to do some extra stuff. Right now it's mostly auth-related hooks but my hope is that it expands in the future.

Plrust is a procedural language extension for Rust, allowing user defined functions written in Rust.

The combination of those two could open up a world of rich extensions usable in managed hosted environments like RDS.

reply
imor80
28 days ago
[-]
Agree, Rust and Postgres and a perfect match. It feels so much more productive to write Postgres tooling in Rust. E.g. we already have extensions like pg_graphql[0], pg_jsonschema[1] and wrappers[2] which use pgrx. We don't have plrust on the platform yet though.

Full disclosure: I'm a Rust Engieer at Supabase.

[0] https://github.com/supabase/pg_graphql [1] https://github.com/supabase/pg_jsonschema [2] https://github.com/supabase/wrappers

reply
rubenfiszel
28 days ago
[-]
This is super timely.

Windmill (https://windmill.dev) used to only support webhooks to trigger code and flow jobs. We have just added email support building our own MX server, and wanted to add CDC change. We were gonna do it on Debezium but this will allow us to remove the need for a third-party service and just add this as a crate. Thank you supabase for open-sourcing this.

reply
imor80
28 days ago
[-]
Please do share your experience using pg_replicate as feedback once you integrate with windmill.
reply
jeremyjh
28 days ago
[-]
I'm super excited to see a feature based on this in Windmill.
reply
lknuth
28 days ago
[-]
I would love to see this become the [Litestream](https://litestream.io/) equivalent to SQLite: a very cheap option for simple deployments (ones with a single database) to get continuous backups in a cheap storage from which one can restore when needed.

My specific use-case would be: Single Postgres in my cluster, replicated via something based on pg_replicate running as a side-car and writing to my NAS running Minio.

reply
kiwicopple
28 days ago
[-]
(I'm on the supabase team)

> equivalent to SQLite: a very cheap option for simple deployments

FWIW, we will launch something tomorrow which will cover this part of your use-case. You will be able to find it on our blog in the morning (PT)

reply
eknkc
28 days ago
[-]
I was trying out the stdout example. Could not get it to log anything. DuckDB example worked so I went digging into the source. Apparently the stdout sink is using tracing and I did not have a `RUST_LOG` env var set.

Might be a good idea to have it documented or have the default level set to info for the stdout example.

Maybe this is common Rust knowledge and I just don't know what I'm doing though.

reply
hermanradtke
28 days ago
[-]
I too consider this a footgun.

Most applications will have something like

   tracing_subscriber::registry()
           .with(
               tracing_subscriber::EnvFilter::try_from_default_env()
                   .unwrap_or_else(|_| "my_app=info".into()),
           )
           .with(tracing_subscriber::fmt::layer())
           .init();

in the `main.rs` which will default the tracing if RUST_LOG env var is not set.
reply
imor80
28 days ago
[-]
Thanks for trying out, I'll update the code and/or the README to fix this.
reply
stlava
28 days ago
[-]
Nice! I'm one of the authors of pg-bifrost which is in the same space. Have you thought about / have solved sharding consumption across multiple slots / multi consumers to increase throughput? This is on my radar but not something I've investigated yet.

The issue we've ran into is some team at work decides to re-write an entire table and things get backed up until they stop updating rows.

reply
imor80
28 days ago
[-]
pg-bifrost looks solid.

> Have you thought about / have solved sharding consumption across multiple slots / multi consumers to increase throughput?

Not yet, there has been not performance yet, as the project is still quite young.

reply
steinroe
28 days ago
[-]
This is great! We've been using PostgREST along with a PostgreSQL-based queue to handle side-effects like sending webhooks after database operations (inserts/updates/deletes). The queue feeds into a node server that processes these tasks. However, this setup is becoming a performance bottleneck as we scale.

I'm exploring an alternative way to run logic asynchronously after db operations without the overhead, and I think using cdc to export jobs into an external queue is the way to go here. Essentially a lightweight alternative to Debezium with a better developer experience that is easier to manage. This crate could serve as the core of such a service.

reply
pgnone
28 days ago
[-]
Do you do anything special with toast not changed values, this always seemed problematic to me with doing cdc as either you have to maintain the latest value in your own state or hit the read connection to pull the value
reply
imor80
28 days ago
[-]
Not yet, but it is planned for the future. We'd need to pick one of the two options you suggested.
reply
cryptonector
27 days ago
[-]
A SQLite3 sink would be very nice. Granted, there are impedance mismatches between PG and SQLite3, but the onus for making the schemas compatible should be on the user.
reply
alexnewman
28 days ago
[-]
Wow I think i build one of the first pg-rust-cdc replication module for postgresql and now it seems mainstream. It was a part of my https://github.com/posix4e/rpgffi project, which is garbage code I'm now embarrassed about. Rust programming has come a long way and we didn't have the best practices. Now it seems actual grownup engineering teams are living the dream. We truly live in the best of all universes.
reply
imor80
28 days ago
[-]
These days, for writing extensions/FFI pgrx[0] is the best crate.

[0] https://github.com/pgcentralfoundation/pgrx

reply
alexnewman
27 days ago
[-]
great point, my old thing is old and ghetto
reply
shayonj
28 days ago
[-]
Very cool! Thanks for sharing. I am hacking on something similar too and I think there are a lot of interesting use cases to unlock with this model
reply
jijojohnxx
27 days ago
[-]
Wow, Rust meets SQL! Excited to see seamless Postgres replication with pg_replicate.
reply
jijojohnxx
27 days ago
[-]
This is awesome! Rust + Postgres = SQL powerhouse! Great tool for robust replication apps
reply