We encourage staff to play with both, and they can play with impunity since it's a copy that will get replaced soon-ish.
This makes it important that both work reliably, which means we know when our backups stop working.
We haven't had a disaster recovery situation yet(hopefully never), but I feel fairly confident that getting the DB back shouldn't be a big deal.
pg_restore will handle roles, indexes, etc assuming you didn't switch the flags around to disable them
If you're on EC2, hopefully you're using disk snapshots and WAL archiving.
If I'm not mistaken, this was in 2016 (that's 10 years next year, time flies when you're having fun) -- which is practically an eternity in IT. I'm no DBA but I'm fairly sure many changes have been made to Postgres since then, including logical replication (which can be selective), parallel apply of large transactions in v16, and so on.
I'm not saying this means their points are invalid, I don't know Postgres well enough for that, but any point made almost 10 years ago against one of the most popular and most actively developed options in its field should probably be taken with a pinch of salt.
High availability is different from disaster recovery
And, of course, your disaster recovery plan is incomplete until you've tested it (at scale). You don't want to be looking up Postgres documentation when you need to restore from a cold backup, you want to be following the checklist you have in your recovery plan and already verified.
Assuming you mean that range to start at 100GB, I've worked with databases that size multiple times but as a freelancer it's definitely not been "most" businesses in that range.
I'm also curious how Distributed Replicated Block Device (DRBD) would perform, it would cause obvious latency but perhaps it would be an easier and more efficient solution for a "hot spare" setup than using Postgres native functionality. To my understanding, DRBD can be configured to protect you from hardware IO errors by "detaching" from an erroring disk.
I also don't know if it's a valid point, but I've heard people say that you don't want a fancy CoW filesystem for databases, since much of the functionality offered are things that databases already solve themselves, so you might be sacrificing performance for safety from things that "should not happen"(tm) anyway, depending on how it's set up I guess.
I also recommend pg_repack[2] to squash tables on a live system and reclaim disk space. It has saved me so much space.
do you export the data with this and then import it in the other db with it?
or do you work with existing postgres backups?
If things go truly south, just hope you have a read replica you can use as your new master. Most SLAs are not written with 72h+ of downtime. Have you tried the nuclear recovery plan, from scratch? Does it work?
"Restore from scratch" can mean a lot of different things, if you have a read replica that you can promote then in relative terms to 72h+ downtime, this should be fairly quick, no?
If you have block-level backups or snapshots, with ZFS for example as someone mentioned, it should also be relatively quick -- although I assume this would make any hypothetical read replica split-brain.