Parallel Snapshotting: make pg_dump and pg_restore multi-threaded per table
83 points
10 days ago
| 3 comments
| blog.peerdb.io
| HN
cswilliams
9 days ago
[-]
Several years ago at a company I no longer work at, when we swapped from single-threaded pgdumps (from a replica) to multi-threaded pgdumps (-j flag), we occasionally saw some dumps which were "inconsistent". Not sure if that's the best term for it, but we would occasionally get duplicate foreign key errors on pg_restore or even if pg_restore succeeded, sometimes primary key sequences would be less than the maximum id of our table (so new inserts would fail). This was on standard postgres 13.3 (not aurora) on AWS RDS. Nothing really noteworthy about our setup either that I can remember. I tried reporting it at the time on both the postgres mailing list and with AWS support and unfortunately both pointed fingers at the other and I eventually gave up since it was not easy to reproduce. We unfortunately ended up having to go back to single threaded dumps which never produced any of these "corrupt" dumps but was obviously much slower.
reply
lordgilman
9 days ago
[-]
I can’t speak specifically to duplicate foreign key errors but I possibly fixed that bug via a fix to parallel dump/restore in the 12-13 era. It would manifest as a dump that would fail on restore with errors related to schema dependencies (trying to restore database objects out-of-order).
reply
porsager
9 days ago
[-]
I was expecting to see https://github.com/dimitri/pgcopydb when clicking this. How is this different?
reply
saisrirampur
9 days ago
[-]
OP here. pgcopydb is also a great tool. It does pg_dump|pg_restore. It cannot multi-thread (parallelize) single table data migration
reply
lmz
9 days ago
[-]
seems like it can parallelize to an extent: https://pgcopydb.readthedocs.io/en/latest/concurrency.html#s...
reply
fyrn_
9 days ago
[-]
It only uses pg_dump and pg_restore for transfer of the schema, the actual data transfer is a custom mechanism.
reply
joevandyk
9 days ago
[-]
does pgcopydb support copying a single table in multiple threads?
reply
fyrn_
9 days ago
[-]
reply
digger495
9 days ago
[-]
Because this is advertising :)
reply
victor106
9 days ago
[-]
Does this work with Azure|AWS Postgres as source and replicate to an onPrem Postgres and vice-versa?
reply
saisrirampur
8 days ago
[-]
Yes it should work for any postgres database as a source or a target. Let it be on the cloud or on-premise
reply