PlanetScale for Postgres is now GA
297 points
2 days ago
| 25 comments
| planetscale.com
| HN
theanirudh
2 days ago
[-]
We just migrated to PlanetScale Postgres Metal over the weekend. We are already seeing major query improvements. The migration was pretty smooth. Post-migration we hit a few issues (turned out it wasn't an issue with PlanetScale), and the PlanetScale team jumped in immediately to help us out, even on a Saturday morning so support's been amazing.

The Insights tab also surfaced missing indexes we added, which sped things up further. Early days, but so far so good.

reply
benterix
2 days ago
[-]
Out of curiosity: how do you connect your databases to external services that are consuming these data? In places I do similar work, databases are usually in the same private network as the instances which are reading and writing data to them. If you put them somewhere on the internet, apart from security, doesn't it affect latency?
reply
theanirudh
2 days ago
[-]
Their databases are hosted on AWS and GCP so latency isn't much of an issue. They also have AWS Private Link and if configured it won't go over the internet.
reply
sreekanth850
1 day ago
[-]
No matter if its hosted on Azure GCP or AWS, latency is real. Cloud providers doesn't magically eliminates the Geography and phhysics. Private network don't eliminates latency magically. In general, Any small latency hike can potentially create performance bottlenecks for write operations in strong consistency DB like postgres or MySQL because each write operation go through a round trip from your server to remote planetscale server that create transaction overhead. Complex transactions with multiple statements can amplify this latency due to this round trip. But you could potentially reduce this latency by hosting your app near to where planet scale host their DB cluster though. But that is a dependency or compromise. Edit: A few writes per second? Probably fine. Hundreds of writes per second? Those extra milliseconds become a real bottleneck.
reply
mattrobenolt
1 day ago
[-]
You can simply place your database in the same AWS or GCP region and the same AZs.
reply
aiisthefiture
1 day ago
[-]
Your database will get slower before the latency is an issue.
reply
hobofan
1 day ago
[-]
> Hundreds of writes per second? Those extra milliseconds become a real bottleneck.

Of course it's nicer if the database can handle it, but if you are doing hundreds of sequential non-pipelined writes per second, there is a good chance that there is something wrong with your application logic.

reply
sreekanth850
1 day ago
[-]
Not universal, there are systems that need high frequency, low latency, strong consistency, writes.
reply
hobofan
1 day ago
[-]
Yes, but for the majority of those, rhose would be individial transactions per e.g. request, so the impact would be a fixed latency penalty rather than a multiplicative one.
reply
oefrha
2 days ago
[-]
PlanetScale runs in AWS/GCP, so not really “somewhere on the internet” if your workload is already there.
reply
siquick
1 day ago
[-]
This is the thought I always come back to with the non-big cloud services. It’s pretty much always been mandatory at non-startups to have all databases to be hidden away from the wider internet.
reply
oefrha
2 days ago
[-]
Would you mind sharing what you were migrating from, and what kind of issues you ran into?
reply
ProofHouse
2 days ago
[-]
appreciate you sharing
reply
endorphine
2 days ago
[-]
Care to elaborate what kind of issues? Looking into migrating as well.
reply
theanirudh
2 days ago
[-]
The issues weren't PlanetScale related. We use Hasura and when we did the cutover, we connected to the DB via PGBouncer and some features don't work right. Started seeing a lot of errors so paged them and they helped out. We were connecting directly to PG previously but when we cutover we missed that.
reply
ritzaco
2 days ago
[-]
This seems to be mainly aimed at existing PlanetScale customers.

> To create a Postgres database, sign up or log in to your PlanetScale account, create a new database, and select Postgres.

It does mention the sign up option but doesn't really give me much context about pricing or what it is. I know a bit, but I get confused by different database offerings, so it seems like a missed opportunity to give me two more sentences of context and some basic pricing - what's the easiest way for me to try this if I'm curious?

On the pricing page I can start selecting regions and moving slides to create a plan from $39/month and up, but I couldn't easily find an answer to if there's a free trial or cheaper way to 'give it a spin' without committing.

reply
intelekshual
2 days ago
[-]
PlanetScale (famously?) deprecated their free "Hobby" tier (plus fired their sales & marketing teams) back in 2024 to achieve profitability

https://planetscale.com/blog/planetscale-forever

reply
rimprobablyly
1 day ago
[-]
> famously?

Notoriously

reply
diordiderot
1 day ago
[-]
Could you explain?
reply
rimprobablyly
1 day ago
[-]
Look up the definition of famous and notorious. What needs explaining?
reply
dangoodmanUT
2 days ago
[-]
PlanetScale isnt' really designed for the "ill give it a go" casual customer that might use supabase

It's designed for businesses that need to haul ass

reply
game_the0ry
2 days ago
[-]
I am not experienced enough to know the performance differences between planetscale and supabase, but...

> It's designed for businesses that need to haul ass

Could you elaborate what you meant by this for my education?

reply
samlambert
2 days ago
[-]
Performance differences between PlanetScale and Supabase: https://planetscale.com/benchmarks/supabase
reply
ndriscoll
1 day ago
[-]
> Businesses that need to haul ass

> Benchmarks are done on a dual-core VM with "unlimited" IOPS

I'd be interested in a comparison with a pair of Beelink SER5 Pros ($300 each) in master-slave config.

reply
parthdesai
1 day ago
[-]
> > Benchmarks are done on a dual-core VM with "unlimited" IOPS

Unlimited is a feature here, no need to be snarky. They famously went against the accepted practice of separating storage from compute, and as a result, you reduce latency by an order of magnitude and get unlimited IOPS.

reply
ndriscoll
1 day ago
[-]
You do not get unlimited IOPS with any technology, but you especially do not get it in AWS, where the machines seem to be? Writing "unlimited" is completely unserious. If it's 67k read/33k write at 4k qd32 or something just say so. Or if you're actually getting full bandwidth to a disk with a 2 core VM (doubt), say 1.5M or whatever.
reply
mattrobenolt
1 day ago
[-]
Unlimited in this context just means you're going to be CPU limited before you hit limits on IOPS. It'd be technically not possible to be bottlenecked on IOPS.

That might not be 100% true, but I've never seen a RDBMS be able to saturate IOPS on a local NVMe. It's some quite specialized software to leverage every ounce of IOPS without being CPU bottlenecked first. Postgres and MySQL are not it.

reply
ndriscoll
1 day ago
[-]
What does "local NVMe" mean for you? AFAIK in AWS if you have a 2 core VM you're getting ~3% of a single disk worth of IOPS for their attached storage. Technically NVMe. Not generally what people think when a laptop can do 50x more IO. The minipc I mentioned has 4x the core count and... well who knows how much more IO capacity, but it seems like it should be able to trounce both. Obviously an even more interesting comparison would be... a real server. Why is a database company running benchmarks on something comparable to my low-end phone?

Anyway, saying unlimited is absurd. If you think it's more than you need, say how much it is and say that's more than you need. If you have infinite IOPS why not do the benchmark on a dataset that fits in CPU cache?

reply
mattrobenolt
1 day ago
[-]
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-inst...

Not all AWS instance types support NVMe drives. It's not the same as normal attached storage.

I'm not really sure your arguments are in good faith here tho.

This is just not a configuration you can trivially do while maintaining durability and HA.

There's a lot of hype going the exact opposite direction and more separation of storage and compute. This is our counter to that. We think even EBS is bad.

This isn't a setup that is naturally just going to beat a "real server" that also has local NVMes or whatever you'd do yourself. This is just not what things like RDS or Aurora do, etc. Most things rely on EBS which is significantly worse than local storages. We aren't really claiming we've invented something new here. It's just unique in the managed database space.

reply
ndriscoll
1 day ago
[-]
Right, but as far as I know, the only instances that give you full bandwidth with NVMe drives are metal. Everything else gives you a proportional fraction based on VM size. So for most developers, yes it is hard to saturate an NVMe drive with e.g. 8-16 cores. Now how about the 100+ cores you actually need to rent to get that full bandwidth?

I agree that EBS and the defaults RDS tries to push you into are awful for a database in any case. 3k IOPS or something absurd like that. But that's kind of the point: AWS sells that as "SSD" storage. Sure it's SSD, but it's also 100-1000x slower than the SSDs most devs would think of. Their local "NVMe" is AFAIK also way slower than what it's meant to evoke in your mind unless you're getting the largest instances.

Actually, showing scaling behavior with large instances might make Planetscale look even better than competitors in AWS if you can scale further vertically before needing to go horizontal.

reply
mattrobenolt
1 day ago
[-]
Right, but I think you're kinda missing a lot of the tangible benefits here. This IMO is just reinforcing the idea of "unlimited" IOPS. You can't physically use the totality of IOPS available on the drives.

Even if you can't saturate them, even with low CPU cores, latency is drastically better which is highly important for database performance.

Having low latency is tangibly more important than throughput or number of IOPS once your dataset is larger than RAM no matter how many CPU cores you have.

Chasing down p95s and above really shine with NVMes purely from having whatever order of magnitude less latency.

Less latency also equates to less iowait time. All of this just leads to better CPU time utilization on your database.

reply
ndriscoll
1 day ago
[-]
How does "AWS limits IOPS. 'NVMe' drives are not as fast as the drives you're used to unless you rent the biggest possible servers" reinforce "unlimited" IOPS?

Yes there are benefits like lower latency, which is often measured in terms of qd1 IOPS.

reply
parthdesai
1 day ago
[-]
They literally state this on their metal offering

> Unlimited I/O — Metal's local NVMe drives offer higher I/O bandwidth than network-attached storage. You will run out of CPU long before you use all your I/O bandwidth.

https://planetscale.com/metal#benefits-of-metal

reply
maxenglander
1 day ago
[-]
In addition to the point about performance Sam made, PlanetScale's Vitess (MySQL) offers out-of-the-box horizontal scalability, which means we can maintain extremely good performance as your dataset and QPS grows to a massive scale: https://planetscale.com/case-studies/cash-app. We will be bringing the same capability to Postgres later on.

Our uptime and reliability is also higher than what you might find elsewhere. It's not uncommon for companies paying lots of money to operate elsewhere to migrate to PlanetScale for that reason.

We're a serious database for serious businesses. If a business can't afford to spend $39/mo to try PlanetScale, they may be happier operating elsewhere until their business grows to a point where they are running into scaling and performance limits and can afford (or badly need, depending on the severity of those limits) to try us out.

reply
ritzaco
2 days ago
[-]
businesses that 'need to haul ass' usually still want to try something out before buying it. That doesn't need to a a free plan, but it's common to offer some trial period to new users.

Also totally OK if planetscale doesn't do this and that $39/month _is_ the best way to try them out, I just think it would be good for them to make explicit in the article what I should do if I think I might want it but want to try it.

reply
rcrowley
2 days ago
[-]
All our list prices are monthly and our bills are actually even finer-grained - there's no commitment to pay for a database longer than you run it.

If you do decide to operate on PlanetScale long-term, check out <https://planetscale.com/pricing> for consumption commitment discounting and other options that might make sense for your company.

reply
dangoodmanUT
2 days ago
[-]
They try it by contacting sales and setting up a pilot, not a self-service free trial
reply
stronglikedan
2 days ago
[-]
> but it's common to offer some trial period to new users

That is rather uncommon for B2B.

reply
ProofHouse
2 days ago
[-]
Not a single explanation of what ‘PlanetScale’ is, does (or how) on that landing page. A product, a service, a new offering or scaling paradigm, a cloud? Etc

Sure you can click around to determine but this always annoys me. Like everyone should know what your product is and does and all you service names. Put it front and center at the top!

reply
richieartoul
2 days ago
[-]
It's not a landing page, it's a blog, and if you read the first few sentences of the post it becomes immediately clear what service PlanetScale provides.
reply
DarkNova6
2 days ago
[-]
The first few sentences mean absolutely squat.

> Our mission is simple: bring you the fastest and most reliable databases with the best developer experience. We have done this for 5 years now with our managed Vitess product, allowing companies like Cursor, Intercom, and Block to scale beyond previous limits.

> We are so excited to bring this to Postgres. Our proprietary operator allows us to bring the maturity of PlanetScale and the performance of Metal to an even wider audience. We bring you the best of Postgres and the best of PlanetScale in one product.

Seriously??

reply
therein
2 days ago
[-]
Sounds like it is yet another Postgres cloud offering. It is a little cringe for them to self-congratulate and say "allowing companies like Cursor, Intercom, and Block to scale beyond previous limits".

Did any of these companies reach out to them and say "you know, we wouldn't have been able to scale beyond our previous limits without you, thank you so much guys you saved us". If not, this is so insincere that it is cringe.

Are they implying these other companies lacked knowledge and expertise to put their databases on machines with NVMe storage? Or is it that they chose to use their product? If it is the latter, they should just say these companies chose us, instead of emphasizing how they just couldn't scale past their previous limits without PlanetScale's help.

reply
samlambert
1 day ago
[-]
happy to answer this with direct customer quotes from the companies you mentioned. all of these quotes are public:

"We chose PlanetScale to host our most demanding Vitess and Postgres workloads, doing millions of queries per second on hundreds of terabytes of data." – Sualeh Asif - Chief Product Officer @Anysphere (Cursor)

"Moving to PlanetScale added a 9 to our uptime." - Brian Scanlan @Intercom https://x.com/brian_scanlan/status/1963552743294967877

"In the past we've had issues when something unusual happens on a specific shard, resulting in spiked CPU and poor performance, and since migrating we haven't really seen instances of this, speaking to PlanetScale choosing the correct hardware for our existing load at the outset." - Aaron Young, Engineering Manager @block

It seems like you are reaching pretty hard to find an issue with this statement. Your comment seems to come from a lack of experience scaling databases and not understanding how difficult it is to do what we've done in partnership with our customers. Either that or deep or a high level of insincerity.

reply
therein
1 day ago
[-]
> It seems like you are reaching pretty hard to find an issue with this statement. Your comment seems to come from a lack of experience scaling databases and not understanding how difficult it is to do what we've done in partnership with our customers. Either that or deep or a high level of insincerity.

Up until this, I was gonna say, fair enough, I appreciate the direct replies from the staff.

But this paragraph settles it for me: PlanetScale as a company has a narcissistic personality which is fine for some I guess. Hopefully one day you will have a product that justifies that huge ego.

reply
samlambert
1 day ago
[-]
you made a dumb comment and got heat for it. i don't think that represents a narcissistic personality. regardless of what you think of us i wish you all the best.
reply
JoshPurtell
1 day ago
[-]
I had a high opinion of PS before this comment.

Now I have a higher opinion of PS

reply
therein
1 day ago
[-]
Oh I got the heat for it? From you? I felt none of it. Way to represent your employer weird champ.

It is wild that an employee (lmao CEO) posts this way and it is sanctioned by his employer. I guess you're used to talking this way to your employees. But I am not an employee, so I can't feel your fury.

I am glad it is taking place in public, I can only imagine how poorly you must treat people behind closed doors. At least here people can see it for themselves how unprofessionally this company is run. I wish nothing but patience to your employees, God knows what they must be saying once you're out of the room.

You had one job here, to represent your company in a professional level-headed manner and you couldn't even do that. Such a shame.

reply
xnickb
1 day ago
[-]
Oh come on now, no one reads this deep into the conversation.

Sigh.

reply
ksec
1 day ago
[-]
>But this paragraph settles it for me: PlanetScale as a company has a narcissistic personality which is fine for some I guess. Hopefully one day you will have a product that justifies that huge ego.

What you wrote earlier.

>Did any of these companies reach out to them and say "you know, we wouldn't have been able to scale beyond our previous limits without you, thank you so much guys you saved us". If not, this is so insincere that it is cringe.

I guess I will let the rest of HN be the judge.

reply
maxenglander
1 day ago
[-]
> instead of emphasizing how they just couldn't scale past their previous limits.

We are not saying that our customers don't have the knowledge or expertise to do what we do. Many of our customers, including the ones mentioned above, have exceptionally high levels of expertise and talent.

Even so, it is not a contradiction to say that we allowed them to scale beyond their previous limits. In some cases those limits were that their previous DBaaS providers simply lacked the ability to scale horizontally or provide blazing fast reads and writes the way we do out of the box. In other cases, we offer a degree of reliability and uptime that exceeded what customers' previous DBaaS could provide. Just two name a couple of limits customers have run into before choosing PlanetScale.

Expertise and know-how, and actually doing the thing, are different. Many of our customers who are technically capable of doing what we do would simply prefer to focus their knowledge and expertise building their core product, and let the database experts (that's us) do the databasing.

reply
sgarland
1 day ago
[-]
> Are they implying these other companies lacked knowledge and expertise to put their databases on machines with NVMe storage?

Have you worked at any web dev companies? Of the ones I’ve been at, precisely one had any desire to run their own DBs, and deaf was more out of necessity due to poor schema design needing local NVMe just to stay afloat.

Yes, most web companies lack the experience to touch a server, because their staff are all cloud-native, and their CTOs have drank the Kool-Aid and are convinced that it’s dangerous and risky to manage a server.

reply
pier25
2 days ago
[-]
If would have taken you less time to google "planetscale" than to write this comment.
reply
ProofHouse
2 days ago
[-]
I mean add a 1-2 sentence description of the HOW to this paragraph. Cause like great, but how. This is just marketing fluff and a user has to navigate the site to then understand what PlanetScale itself does (and how), if not familiar;

What is PlanetScale for Postgres?

Our mission is simple: bring you the fastest and most reliable databases with the best developer experience. We have done this for 5 years now with our managed Vitess product, allowing companies like Cursor, Intercom, and Block to scale beyond previous limits.

reply
brettgriffin
2 days ago
[-]
Literally the first line of every line on the site:

> PlanetScale is the world’s fastest relational database platform. We offer PostgreSQL and Vitess databases that run on NVMe-backed nodes to bring you scale, performance, reliability, and cost-efficiencies — without sacrificing developer experience.

> PlanetScale is a relational database platform that brings you scale, performance, and reliability — without sacrificing developer experience.

> We offer both Vitess and PostgreSQL clusters, powered by locally-attached NVMe drives that deliver unlimited IOPS and ultra-low latency.

> PlanetScale Metal is the fastest way to run databases in AWS or GCP. With blazing fast NVMe drives, you can unlock unlimited IOPS, ultra-low latencies, and the highest throughput for your workloads.

> The world’s fastest and most scalable cloud databases PlanetScale brings you the fastest databases available in the cloud. Both our Postgres and Vitess databases deliver exceptional speed and reliability, with Vitess adding ultra scalability through horizontal sharding.

> Our blazing fast NVMe drives unlock unlimited IOPS, bringing data center performance to the cloud. We offer a range of deployment options to cover all of your security and compliance requirements — including bring your own cloud with PlanetScale Managed.

Ironically, the _how_ is a major topic of the very page you started on (the blog).

Have some agency.

reply
cyberpunk
2 days ago
[-]
This isn’t any kind of answer it’s a bunch of non-statements..

How is this any different that rds on nvme disks?

With a name like planet scale i assumed it would be some multi-master setup?

reply
benjiro
1 day ago
[-]
Planetscale used to run postgres on AWS with network attached storage. So every time the DB hits the disks > it goes over the network. Need to read 4kb > network, another 4kb > network. So your latency is instead of microsecond on local storage, its miliseconds. Where a local NVME can do ... 100k 4k reads, over the network storage it does maybe 1k (just a example).

The problem is, there are not a lot of solutions to scale postgres beyond a single server. So if your DB grows to 100TB ... you have a issue as AWS does not provide a 100TB local NVME solution, only network storage.

Here comes Niki or whatever they named it. Their own alternative to Vitess (see Mysql), what is a solution that allows Mysql to scale horizontally from 1 to 1000's of servers, each with their own local storage.

So Planetscale made their own solution, so they can horizontal scale dozens, hundreds of AWS VPS with their own local storage, to give you those 100, 200, 500TB of storage space, without the need for network based storage.

There are other solutions like CockroachDB, YukubyteDB, TiDB that also allow for horizontal scaling but non are 100% postgres (and especially extensions) compatible.

Side node: The guy that wrote Vitess for Mysql, is also working on multigress (https://multigres.com/), a solution that does the same. Aka Vitess for postgres.

So yea, hope this helps a bit to explain it. If your not into dealing with DB scaling, the way they wrote it is really not helpful.

reply
parthdesai
1 day ago
[-]
> Side node: The guy that wrote Vitess for Mysql, is also working on multigress (https://multigres.com/), a solution that does the same. Aka Vitess for postgres.

And also was the founder of planetscale

reply
whizzter
2 days ago
[-]
sharded setup with a bit fast and loose foreign key management, so very good for performance but not a drop-in replacement if you rely on your foreign keys to be constrained/checked by the database.
reply
sgarland
1 day ago
[-]
So perfect for most web dev companies, then.

“We handle FKs in the app for flexibility.”

“And how many orphaned rows do you have?”

“…”

reply
rcrowley
1 day ago
[-]
The question isn't how many orphaned rows do you have, it's whether it matters. Databases are wonderful but they cannot maintain every invariant and they cannot express a whole application. They're one tool in the belt.
reply
sgarland
1 day ago
[-]
> cannot express a whole application

Not with that attitude: https://docs.postgrest.org/en/v13/index.html

Orphaned rows can very much matter for data privacy concerns, which is also where I most frequently see this approach failing.

reply
jashmatthews
1 day ago
[-]
Most companies can afford not to give a shit until they hit SOC2 or GDPR compliance and then suddenly orphaned data is a giant liability.
reply
rcrowley
1 day ago
[-]
The short answer is that RDS doesn't run on local NVMe disks, it runs on EBS.
reply
dkhenry
2 days ago
[-]
I would just point out that its Hosted Postgres. If your looking for a how I think you have the wrong mental model. Its hosted Postgres, there are some nuance there as to why it would perform differently from RDS on Amazon, or CloudSQL on GCP, but its not some novel new technology that needs a long description.

If you are interested in their new technology that extends on hosted postgres check out Neki https://www.neki.dev/

reply
tjoekbezoer
2 days ago
[-]
Well, above the fold in the chapter 'What is PlanetScale for Postgres?' the second paragraph mentions custom operator. That makes me assume it is kubernetes.
reply
samlambert
2 days ago
[-]
it is kubernetes
reply
samlambert
2 days ago
[-]
If anyone has questions about our Postgres product please feel free to ask. I will be around to answer.
reply
bri3d
2 days ago
[-]
* I saw your benchmark page at https://planetscale.com/benchmarks/aurora ; do you have something similar for Aurora Serverless?

* Do you support something like Aurora Fast Cloning (whether a true CoW fast clone or detaching a replica _without_ promoting it into its own cluster / branch with its own replicas, incurring cost)?

* Can PlanetScale Postgres set `max_standby_streaming_delay` to an indefinite amount?

* The equivalent of Aurora blue/green would be to make a branch and then switch branches, right?

reply
samlambert
2 days ago
[-]
it should be pretty much the same for aurora serverless and likely even cheaper. we see some astronomically expensive auorora serverless bills.

We have not made max_standby_streaming_delay configurable yet. What's your use case?

I don't fully parse your question about blue/green. can you expand your question please? is this for online updrades?

reply
bri3d
1 day ago
[-]
> I don't fully parse your question about blue/green. can you expand your question please? is this for online updrades?

Online upgrade or migration DDL - both use cases. I think Amazon's blue/green is effectively the same thing as your "branch-and-commit" strategy for schema migration. I was just looking for whether there's a significant difference.

> We have not made max_standby_streaming_delay configurable yet. What's your use case?

This goes with

>> Do you support something like Aurora Fast Cloning (whether a true CoW fast clone or detaching a replica _without_ promoting it into its own cluster / branch with its own replicas, incurring cost)?

The use case is mixing transaction processing and long-running queries/analytics in the same database using read replicas. The easiest way to do this in a native Postgres cluster is by using a "soft-real-time" read-replica with max_standby_streaming_delay set to `-1`, which is allowed to fall behind worst-case by the duration of a dashboard query and then keep up again.

This doesn't work in environments with more esoteric SAN-based replication strategies like Aurora, where max_standby_streaming_delay can't go beyond 30 seconds. In this case we have to use some combination of strategies: making CoW clones for each analytics task, architecting data to avoid leader/replication conflicts, for example by using partitioning, retrying replica queries until they don't access hot rows, or falling back to traditional analytics/data warehouse solutions at the application or logical replication layer. Not having to do these things would be a nice benefit over Aurora.

reply
Ozzie_osman
1 day ago
[-]
Any recommendations on how to best test our current workload of reads and writes? Also, if we are pretty certain we will need horizontal partitioning / sharding, would it be better to wait for Neki before considering a move?

For context we are on Aurora Postgres right now, with several read replicas.

reply
endorphine
2 days ago
[-]
Since NVMe nodes are ephemeral on GCP, would you suggest SaaS with critical customer data to use Metal or persistent disks?
reply
rcrowley
2 days ago
[-]
PlanetScale always maintains three (or more, if you want) copies of the data and only acknowledges writes after they've been written to at least two cloud provider availability zones. Durability is provided via replication at the database layer rather than hidden in the slow network-attached block device. Most of our largest customers use Metal for their most critical data and all of them saw their 99th percentile latency plummet when they migrated.

I did an interview all about PlanetScale Metal a couple of months ago: <https://www.youtube.com/watch?v=3r9PsVwGkg4>

reply
the_mitsuhiko
2 days ago
[-]
If neki becomes available later, do you expect that customers will be able to easily migrate over to it?
reply
samlambert
2 days ago
[-]
we will aim to make it as easy as possible and doable as an online process. with sharding there is always a chance that some application changes are needed so there might be some work required there.
reply
sethreno
1 day ago
[-]
Will Neki be a branch of [Citus](https://github.com/citusdata/citus) or is it more based on Vitess?
reply
samlambert
1 day ago
[-]
Inspired by Vitess seeing as we are the company behind Vitess but it's a new codebase.
reply
attentionstinks
2 days ago
[-]
How should one decide whether to go with MySQL or Postgres for a greenfield project?
reply
add-sub-mul-div
2 days ago
[-]
Pre-existing expertise with MySQL and lack of time or inclination to learn something new is the only reason I could think of not to go with Postgres.
reply
jedberg
2 days ago
[-]
At this point I'm not sure why anyone would choose MySQL. Any advantage it had pretty much evaporated with these hosted solutions.

For example, MySQL was easier to get running and connect to. These cloud offerings (Planetscale, Supabase, Neon, even RDS) have solved that. MySQL was faster for read heavy loads. Also solved by the cloud vendors.

reply
bri3d
2 days ago
[-]
At large scale I'd say MySQL is still a competitor for a few reasons:

* Scale-out inertia: yes, cloud vendors provide similar shading and clustering features for Postgres, but they're all a lot newer.

* Thus, hiring. It's easier to find extreme-scale MySQL experts (although this erodes year by year).

* Write amplification, index bloat, and tuple/page bloat for extremely UPDATE heavy workloads. It is what it is. Postgres continues to improve, but it is fundamentally an MVCC database. If your workload is mostly UPDATEs and simple SELECTs, Postgres will eventually fall behind MySQL.

* Replication. Postgres replication has matured a ridiculous amount in the last 5-10 years, and to your point, cloud hosting has somewhat reduced the need to care about it, but it's still different from MySQL in ways that can be annoying at scale. One of the biggest issues is performing hybrid OLAP+OLTP (think, a big database of Stuff with user-facing Dashboards of Stuff). In MySQL this is basically a non-event, but in Postgres this pattern requires careful planning to avoid falling afoul of max_standby_streaming_delay for example.

* Neutral but different: documentation - Postgres has better-written user-facing documentation for user-facing functions, IMO. However, _if_ you don't like reading source code, MySQL has better internals documentation, and less magic. However, Postgres is _very_ well written and commented, so if you're comfortable reading source, it's a joy. A _lot_ of Postgres work, in my experience, is reading somewhat vague documentation followed by digging into the source code to find a whole bunch of arbitrary magic numbers. If you don't believe me , as an exercise, try to figure out what `default_statistics_target` _actually_ does.

Anyway, I still would choose a managed Postgres solution almost universally for a new product. Unless I know _exactly_ what I'm going to be doing with a database up-front, Postgres will offer better flexibility, a nicer feature-set, and a completely acceptable scale story.

reply
jashmatthews
1 day ago
[-]
> hybrid OLAP+OLTP .... in Postgres this pattern requires careful planning to avoid falling afoul of max_standby_streaming_delay for example

This is a really gnarly problem at scale I've rarely seen anyone else bring up. Either you use max_standby_streaming_delay and queries that conflict with replication cause replication to lag or you use hot_standby_feedback and long running queries on the OLAP replica cause problems on the primary.

Logical Decoding on a replica in also needs hot standby feedback which is a giant PITA for your ETL replica.

reply
jedberg
1 day ago
[-]
I appreciate your detailed reply, and I agree with all your points.

I am however highly amused that everyone in this thread defending MySQL ends with some form of "I'd still choose Postgres though!". :)

reply
cortesoft
2 days ago
[-]
> At this point I'm not sure why anyone would choose MySQL

Because I have used MySQL for over 20 years and it is what I know!

reply
jedberg
2 days ago
[-]
Fair enough, but I assume most of that is in the administration of MySQL? Which is all now abstracted away by the cloud vendors.

If you're running it yourself I could see why you'd do that, but if you're mostly just using it now, Postgres can do all the same things in the database pretty much the same way, plus a whole lot more.

reply
cortesoft
1 day ago
[-]
Its both operating MySQL and creating applications that use it.

Additionally, almost all my workloads run in our own datacenters, so I haven't yet been able to offload the administration bits to the cloud.

reply
n_u
2 days ago
[-]
From my position

MySQL pros:

The MySQL docs on how the default storage engine InnoDB locks rows to support transaction isolation levels is fantastic. [1] This can help you better architect your system to avoid lock contention or understand why existing queries may be contending for locks. As far as I know Postgres does not have docs like that.

MySQL uses direct I/O so it disables the OS page cache and uses its own buffer pool instead[2]. Whereas Postgres doesn't use direct I/O so the OS page cache will duplicate pages (called the "double buffering" problem). So it is harder to estimate how large of a dataset you can keep in memory in Postgres. They are working on it though [3]

If you delete a row in MySQL and then insert another row, MySQL will look through the page for empty slots and insert there. This keeps your pages more compact. Postgres will always insert at the bottom of the page. If you have a workload that deletes often, Postgres will not be using the memory as efficiently because the pages are fragmented. You will have to run the VACUUM command to compact pages. [4]

Vitess supports MySQL[5] and not Postgres. Vitess is a system for sharding MySQL that as I understand is much more mature than the sharding options for Postgres. Obviously this GA announcement may change that.

Uber switched from MySQL to Postgres only to switch back. It's a bit old but it's worth a read. [6]

Postgres pros:

Postgres supports 3rd party extensions which allow you to add features like columnar storage, geo-spatial data types, vector database search, proxies etc.[7]

You are more likely to find developers who have worked with Postgres.[8]

Many modern distributed database offerings target Postgres compatibility rather than MySQL compatibility (YugabyteDB[9], AWS Aurora DSQL[10], pgfdb[11]).

My take:

I would highly recommend you read the docs on InnoDB locking then pick Postgres.

[1] https://dev.mysql.com/doc/refman/8.4/en/innodb-locking.html

[2] https://dev.mysql.com/doc/refman/8.4/en/memory-use.html

[3] https://pganalyze.com/blog/postgres-18-async-io

[4] https://www.percona.com/blog/postgresql-vacuuming-to-optimiz...

[5] https://vitess.io/

[6] https://www.uber.com/blog/postgres-to-mysql-migration/

[7] https://www.tigerdata.com/blog/top-8-postgresql-extensions

[8] https://survey.stackoverflow.co/2024/technology#1-databases

[9] https://www.yugabyte.com/

[10] https://aws.amazon.com/rds/aurora/dsql/

[11] https://github.com/fabianlindfors/pgfdb

reply
bri3d
2 days ago
[-]
> I would highly recommend you read the docs on InnoDB locking then pick Postgres.

This made me laugh pretty hard, but it's basically my take too.

I'd pretty much go with the same thing. It's interesting to me, though, that people see Postgres as the "big database" and MySQL as the "hobby database." I basically see things as the exact opposite - Postgres is incredibly flexible, very nice to use, and these days, has fewer foot guns at small scale (IMO) than MySQL. It's more academically correct and it generally tends to "work better" at almost any achievable "normal" scale.

On the other hand, Postgres is full of pitfalls and becomes very difficult at exceptionally large scale (no, not "your startup got traction" scale). Postgres also doesn't offer nearly the same quality of documentation or recipes for large scale optimization.

Almost everything in the 2016 Uber article you link, which is a _great_ read, is still true to some extent with vanilla Postgres, although there are more proprietary scale-out options available now. Postgres simply has not been "hyper-scaled" to the extent that MySQL has and most massive globally sharded/replicated systems started as MySQL at some point.

For this same reason, you are likely to be able to hire a MySQL-family DBA with more experience at hyper-scale than a Postgres one.

With all that said, I still agree - I'd almost universally start with Postgres, with MySQL as a back-pocket scale-up-and-out option for specific very large use-cases that don't demand complex query execution or transactional workload properties. Unless you have an incredibly specific workload which is a very specific combination of heavy UPDATE and `SELECT * FROM x WHERE id=y`, Postgres will do better at any achievable scale you will find today.

reply
n_u
2 days ago
[-]
> This made me laugh pretty hard, but it's basically my take too.

Haha glad you enjoyed it.

> It's interesting to me, though, that people see Postgres as the "big database" and MySQL as the "hobby database." I basically see things as the exact opposite

I agree. As I understand Postgres started as a challenger to SQL[1][2] with support for more complicated data types but then in the mid '90s they added SQL support and it was renamed PostgreSQL.

Anecdotally I have heard from people working in industry in the 2000s-2010s that Postgres was viewed as less mature so many of the large web applications were on MySQL. This is a bit confusing to me because MySQL was released around the same time Postgres added SQL support but maybe it was because MySQL had a company behind it.

Many large scale applications of those days were using MySQL. Facebook developed RocksDB and then MyRocks[3] based on MySQL. Youtube built Vitess [4] which was sharded MySQL which was later used by Slack [5], Square, Pintrest and others.

> It's more academically correct

I'm curious about this. I know that Postgres implements MVCC in a wasteful way and uses the OS page cache in addition to its buffer pool resulting in double buffering rather than direct I/O. I feel like the more I learn about database internals the more I learn about how MySQL did things the "right" way and Postgres's approach is a bit odd. But perhaps I'm missing something.

[1] https://en.wikipedia.org/wiki/PostgreSQL#History

[2] https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec20...

[3] https://engineering.fb.com/2016/08/31/core-infra/myrocks-a-s...

[4] https://vitess.io/docs/22.0/overview/history/

[5] https://slack.engineering/scaling-datastores-at-slack-with-v...

reply
bri3d
2 days ago
[-]
> I feel like the more I learn about database internals the more I learn about how MySQL did things the "right" way and Postgres's approach is a bit odd.

This is a good distinction too; I was thinking from the end-user’s standpoint, where Postgres has historically been seen as more faithful to both SQL standards and consistency guarantees.

reply
jedberg
1 day ago
[-]
> This is a bit confusing to me because MySQL was released around the same time Postgres added SQL support but maybe it was because MySQL had a company behind it.

I think the main reason MySQL took off faster than Postgres originally is because it had better defaults. MySql worked out of the box on modern hardware. Postgres assumed you only have 4MB of memory until well into the 2010s, in part to make it keep running on everything it ever ran on in the past.

So when you first installed Postgres, it would perform terribly until you optimized it.

It's really a fantastic case study in setting good defaults.

reply
pests
1 day ago
[-]
My reason for choosing MySQL in the early days was due to it being the default choice for PHP apps back in the day. Every tutorial, the mysql_ function, every app like Wordpress or anything else, and phpmysqladmin. Postgres was seen as more corporate / official / big boys club in my view… similar to how clickhouse was viewed here until recently.
reply
apavlo
1 day ago
[-]
You are conflating MySQL and InnoDB. The latter does a lot of good things, much more than the former.
reply
n_u
1 day ago
[-]
Haha that's probably correct, databases are a huge topic and I still know very little. I learned most of what I know about databases from you and my work in industry.

Perhaps could you share some of those good / bad things InnoDB / MySQL does?

reply
tracker1
1 day ago
[-]
To me, every time I've touched MySQL, I've found an issue that just rubbed me the wrong way... starting with the fact that UTF8 isn't, and even across new major versions hasn't changed to an alias for the real UTF8. (VAR)BINARY does case sensitive collation based on db default, it shouldn't do any type of textual comparison even if the data is "text". You can't do foreign keys with ANSI quotes for the table/field names...

Note: some of this may have changed in the past 6+ years I've avoided looking at it again.

reply
sgarland
1 day ago
[-]
> (VAR)BINARY

Close [0]: (VAR)CHAR BINARY, which is its own type, uses the `_bin` collation for the column or table character set. (VAR)BINARY stores binary strings, and uses the `binary` character set and collation.

In fairness, the amount of options, gotchas, and WTF involved with collations in any DB is mind-boggling.

[0]: https://dev.mysql.com/doc/refman/8.4/en/binary-varbinary.htm...

reply
tracker1
1 day ago
[-]
I just know that I was sitting converted uuid and was getting collisions on different binary values. This was admittedly a couple decades ago though.

By converted it was legacy records where the original int value was injected to uuid format. This was then stored in a binary field as a primary key.

reply
sgammon
2 days ago
[-]
planetscale now supports both :)
reply
cocoflunchy
1 day ago
[-]
Hi, any idea of the timing to launch in europe-west1 on GCP? Also does branching work on Postgres?
reply
robraven
1 day ago
[-]
Any benchmarks for using planetscale Postgres vs MySQL?
reply
tacone
2 days ago
[-]
Extensions! Which pg extensions are you going to make available?
reply
rcrowley
2 days ago
[-]
reply
dangoodmanUT
2 days ago
[-]
Postgres (esoterically?) has some issues with index bloat on high-insert workloads, does PlanetScale do anything special to tune for this by default, since it caters to higher-perf workloads (over something like supabase)?
reply
petergeoghegan
2 days ago
[-]
Can you provide more detail/a reference?

I've done extensive work on improving the Postgres B-Tree code, over quite a number of releases. I'm not aware of any problems with high-insert workloads in particular. I have personally fixed a number of subtle issues that could lead to lower space utilization with such workloads [1][2] in the past, though.

if there's a remaining problem in this area, then I'd very much like to know about it.

[1] https://www.youtube.com/watch?v=p5RaATILoiE [2] https://speakerdeck.com/peterg/nbtree-arch-pgcon

reply
dangoodmanUT
1 day ago
[-]
In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink. I had eventually switched to an API on top of Badger (golang KV), which afforded me an order of magnitude lower latency at ~30% of the resources IIRC. I'm sure there might have been some tuning I could have done to improve it.

I've also heard similar behaviors exhibited from other folks who had similar high-write workloads on postgres.

Sorry, I don't have anything super tangible to provide off the top of my head, or metrics/code I can share to recreate! It was also a project that required a lot of data to recreate the setup for.

reply
petergeoghegan
1 day ago
[-]
> In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink.

It's possible to recycle pages within indexes that have some churn (e.g., with workloads that use bulk range deletions). But it's not possible for indexes to shrink on their own, in a way that can be observed by monitoring the output of psql's "\di+" command. For that you'd need to REINDEX or run VACUUM FULL.

reply
dangoodmanUT
1 day ago
[-]
I may not be remembering fully, maybe the indexes never shrunk but the tables did in size.

Is there no way to automatically clean up indexes then?

reply
petergeoghegan
1 day ago
[-]
> I may not be remembering fully, maybe the indexes never shrunk but the tables did in size.

That only happens when it is possible to give back space to the OS filesystem using relation truncation in the first place -- which isn't all that common (it's generally only seen when there are bulk range deletions that leave lots of contiguous empty space at the end of a table/heap structure). But you said that this is an append-only workload.

This behavior can be disabled by setting the vacuum_truncate table storage parameter to "off". This is useful with workloads where relation truncation is disruptive (truncation needs to acquire a very heavyweight table lock).

> Is there no way to automatically clean up indexes then?

What I meant was that indexes do not support relation truncation. It follows that the amount of space used for an index (from the point of view of the OS) cannot ever go down, barring a REINDEX or a VACUUM FULL.

This does not mean that we cannot reuse space for previously freed/deleted pages (as long as we're reusing that space for the same index). Nor does it mean that "clean up" isn't possible in any general sense.

reply
jashmatthews
1 day ago
[-]
Does vacuum not release free pages at the end of an index file in the same way it does for the heap?
reply
petergeoghegan
1 day ago
[-]
No, it does not
reply
samlambert
2 days ago
[-]
We don't do anything special (yet) but we do have bloat detection that we warn you about. We've noticed that autovacuum works very well on our Metal product because of the extra resources.
reply
hollylawly
2 days ago
[-]
More information about bloat detection here: https://planetscale.com/docs/postgres/monitoring/schema-reco...
reply
dangoodmanUT
2 days ago
[-]
On another, similar, vein - i'd be curious to know if XID wraparound and auto vacuum tuning was something you had to advise customers on up front consdering how often that issue rears its head for the same kinds of workloads.
reply
bekacru
2 days ago
[-]
We’ve had early access to it for a while now, we’re already running a lot of performance critical workloads on it and it’s been working wonderfully. Congrats sam and the team on setting a new standard for what highly performant managed Postgres should look like :)
reply
t43562
2 days ago
[-]
I don't know why but I can almost never understand American commercial software websites. "what is PlanetScale".....blah, blah blah....WHOOOOSH! No more enlightened than before. Even for products I've worked on - I read the page and can't recognise the thing I'm working on from the description.....

Postgres is involved somehow. I get that.

reply
dfee
2 days ago
[-]
i'll take the opposite side. i was very impressed with their website.

the very first line:

> The world’s fastest and most scalable cloud databases

the second line:

> PlanetScale brings you the fastest databases available in the cloud. Both our Postgres and Vitess databases deliver exceptional speed and reliability, with Vitess adding ultra scalability through horizontal sharding.

i know exactly what they do. zero fluff. and, i'm now interested.

https://planetscale.com/

reply
odie5533
2 days ago
[-]
How is this different than Aurora Postgres or RDS Postgres?
reply
maxenglander
2 days ago
[-]
There are a lot of differences between Aurora/RDS and PlanetScale I could talk about, some but I'll point to just one for now: PlanetScale offers Metal databases, which means blazing fast NVMe drives attached directly to the host where Postgres is running. This gives you faster reads and writes than what either Aurora or RDS can achieve with their network-attached block storage. Check out our benchmarks: https://planetscale.com/blog/benchmarking-postgres

Also, the architecture of Aurora is very different from PlanetScale's:

* AWS Aurora uses storage-level replication, rather than traditional Postgres replication. This architecture has the benefit that a change made on an Aurora primary is visible very quickly on the read replicas. * PlanetScale is a "shared nothing" architecture using what I would call traditional methods of data replication, where the primary and the replicas have independent copies of the data. This means that replication lag is a possibility customers must consider, whereas Aurora customers mostly ignore this. * If you set up 3 AWS RDS Postgres instances in separate availability zones and set up replication between them, that would be roughly similar to PlanetScale's architecture.

reply
t43562
1 day ago
[-]
This is an incredibly good example of what I wanted to know.
reply
t43562
1 day ago
[-]
This is really what I meant. They must offer more than a copy of postgres running on a computer. How do they scale? What are the features? Why would I choose this over RDS for example?

Scaling postgres is not that informative. I am sorry if I annoyed the people working on it. I think the USP could be explained more obviously.

reply
maxenglander
1 day ago
[-]
It's a reasonable question. I think it's too early days for us to be able to provide a feature-by-feature breakdown of PlanetScale Postgres vs. Aurora/RDS. Our stated mission on day 1 (today) is to be the fastest and most reliable Postgres provider out there. The benchmarks we've provided are the clearest, data-driven thing we can point to right now in support of that.

More features will come later on which I think will set us apart even more from RDS, Aurora and other providers, but too early to talk about those.

Beyond features, there are other reasons you might choose us. For example, we've built a reputation on being having excellent reliability/uptime, with exceptionally good support. These are harder to back up with hard data, but our customer testimonials are a good testament to this.

reply
mapmeld
2 days ago
[-]
Everyone keeps asking this but AFAIK they're an alternative database host. If you're a company you can compare their pricing and availability to RDS and other companies. If you have an open AWS contract, or a hobbyist developer who already has databases on AWS you might not see any reason to switch away.
reply
candiddevmike
2 days ago
[-]
Baseless marketing claims aren't considered fluff?
reply
maxenglander
2 days ago
[-]
People can disagree with the claims of course, but I don't think they are baseless.

On the Postgres side: https://planetscale.com/blog/benchmarking-postgres

On the Vitess side, I would point to our customers, who, on individual databases, have achieved pretty high QPS (millions), on large datasets (100s of TiBs), at a latency that is lower than what other DBaaS providers can offer: https://planetscale.com/case-studies/cash-app

reply
0x6c6f6c
1 day ago
[-]
It's fluff to give the elevator pitch now guys, be warned
reply
M4v3R
2 days ago
[-]
idk in the linked post it literally says this in the 2nd paragraph:

> Our mission is simple: bring you the fastest and most reliable databases with the best developer experience.

reply
gpi
2 days ago
[-]
But it's done at a scale that's planetscale
reply
mousetree
2 days ago
[-]
is this web scale?
reply
raffraffraff
2 days ago
[-]
It's not as fast as dev null though
reply
sgammon
2 days ago
[-]
planet bigger than web
reply
fastball
2 days ago
[-]
The homepage splash of this company is literally a few paragraphs that explain exactly what the company does. The problem might be you.
reply
bentohn
1 day ago
[-]
We were in the beta for this and they've been great.

We're presently in a migration for our larger instances on Heroku, but were able to test on a new product (fairly high writes/IOPs) and it's been nice to have more control vs. Heroku (specifically, ability to just buy more IOPs or storage).

Had one incident during the beta which we believed we caused on our own but within 5 minutes of pinging them they had thrown multiple engineers on it to debug and resolve quickly. For me, that's the main thing I care about with managed DB services as most tech is commoditization at this point.

Just wish the migration path from Heroku was a tad easier (Heroku blocks logical replication on all instances) but pushing through anyway because I want to use the metal offering.

reply
ohxh
2 days ago
[-]
Took a while to find on their website but here’s a benchmark vs AWS Aurora:

https://planetscale.com/benchmarks/aurora

Seems a bit better, but they benchmarked on a kind of small db (500gb db / db.r8g.xlarge)

reply
rcrowley
2 days ago
[-]
Fair to say 500GB is small, especially compared to some of the folks who've already migrated, but do note that it's 15x RAM on the benchmark machines, so we really were testing the whole database and not just the memory bandwidth of the CPUs.
reply
vmg12
2 days ago
[-]
How does planetscale for postgres scale? I understand that it's multi node postgres with automatic failover but I think it only really scales for reads and not writes? So is the only way to scale writes horizontally to shard?
reply
samlambert
2 days ago
[-]
Kind of. For horizontally scaling writes we are building the Vitess for Postgres which we are calling Neki https://www.neki.dev/

The product we are GA'ing today has the option of PlanetScale Metal which is extremely fast and scales write QPS further than any of the other single-primary Postgres hosts.

reply
vmg12
2 days ago
[-]
Thanks for the response, this clarifies things for me because I thought this was already a vitess for postgress implementation. Awesome to hear that this is coming.
reply
qaq
2 days ago
[-]
Wonder how https://github.com/multigres/multigres vs PlanetScale will pay out eventually.
reply
anthonyronning
2 days ago
[-]
Been running it for a few months now, such a great and reliable product. Congrats on the release!
reply
fourseventy
2 days ago
[-]
The way I understood NVMe drives to work on Google Cloud is that they are ephemeral and your data will be lost if the vm reboots. How do they work in this case?
reply
mattrobenolt
2 days ago
[-]
We deal with this by always running 3 nodes in a cluster, one per AZ, and strong backup/restore processes.

So yes, the data per-node is ephemeral, but it is redundant and durable for the whole cluster.

reply
bourbonproof
1 day ago
[-]
Do I understand this right: if these 3 nodes shutdown for some reason, all data is lost and you have to actually restore from backup instead of just starting the machine again. And even if you have to restart one node (due to updates, or crashes) you also have to restore from backup? If so, why not pick a hosting provider that doesn't wipe the disk when machine shuts down?
reply
mattrobenolt
1 day ago
[-]
It's more than just shutting down. You'd have to have an actual failure. Data isn't lost on a simple restart. It'd require 3 nodes to die in 3 different AZs.

While that's not impossible, the reality is that's very low.

So simply restarting nodes wouldn't trigger restoring from backup, but yes, in our case, replacing nodes entirely does require that node to restore from a backup/WALs and catch back up in replication.

EBS doesn't entirely just solve this, you still have failures and still need/want to restore from backups. This is built into our product as a fundamental feature. It's transparent to users, but the upside is that restoring from backups and creating backups is tested every day multiple times per day for a database. We aren't afraid of restoring from backups and replacing nodes by choice or by failure. It's the same to us.

We do all of the same operations already on EBS. This magic is what enables us to be able to use NVMe's since we treat EBS as ephemeral already.

reply
rcrowley
2 days ago
[-]
You don't (typically) lose the data on the ephemeral drive across a reboot but you definitely can (and do!) when there are more permanent hardware failures. (They really happen!) That's why PlanetScale always maintains at least three copies of the data. We guarantee durability via replication, not by trusting the (slow, network-attached) block device.

I did an interview all about PlanetScale Metal a couple of months ago: <https://www.youtube.com/watch?v=3r9PsVwGkg4>

reply
n_u
2 days ago
[-]
Hi, thank you for your work on this and being willing to answer questions on it.

"We guarantee durability via replication". I've starting noticing this pattern more where distributed systems provide durability by replicating data rather than writing it to disk and achieving the best of both worlds. I'm curious

1. Is there a name for this technique?

2. How do you calculate your availability? This blog post[1] has some rough details but I'd love to see the math.

3. I'm guessing a key part of this is putting the replicas in different AZs and assuming failures aren't correlated so you can multiply the probabilities directly. How do you validate that failures across AZs are statistically independent?

Thanks!

[1] https://planetscale.com/blog/planetscale-metal-theres-no-rep...

reply
rcrowley
2 days ago
[-]
1. I don't know if there's a single name for this. I will point out that AWS EBS and Google Persistent Disk as industrial examples of distributed, replicated block devices are also providing durability via replication. They're just providing it at a lower level that ends up sacrificing performance. I'm struggling to come up with a citation but I think it's either Liskov or Lynch that offered a proof to the effect of achieving durability in a distributed system via replication.

2. The thinking laid out in the blog post you linked to is how we went about it. You can do the math with your own parameters by computing the probability of a second node failure within the time it takes to recover from a first node failure. These are independent failures, being on physically separate hardware in physically separate availability zones. It's only when they happen together that problems arise. The core is this: P(second node failure within MTTR for first node failure) = 1 - e^( -(MTTR node failure) / (MTBF for a node) )

3. This one's harder to test yourself. You can do all sorts of tests yourself (<https://rcrowley.org/2019/disasterpiece-theater.html>) and via AWS FIS but you kind of have to trust the cloud provider (or read their SOC 2 report) to learn how availability zones really work and really fail.

reply
n_u
1 day ago
[-]
P(two failures within MTTR for first node) = P(one failure)P(second failure within MTTR of first node|one failure)

independence simplifies things

= P(one failure)P(second failure within MTTR of first node)

= P(one failure) * (1 - e^-λx)

where x = MTTR for first node

λ = 1/MTBF

plugging in the numbers from your blog post

P(one failure within 30 days) = 0.01 not sure if this part is correct.

MTTR = 5 minutes + 5 hours =~ 5.083 hours

MTBF = 30 days / 0.01 = 3000 days = 72000 hours

0.01 * (1 - e^(-5.083 / 72000)) = 0.0000007 ~= 0.00007 %

I must be doing something wrong cuz I'm not getting the 0.000001% you have in the blog post. If there's some existing work on this I'd be stoked to read it, I can't quite find a source.

Also there's two nodes that have the potential to fail while the first is down but that would make my answer larger not smaller.

reply
rcrowley
1 day ago
[-]
I computed P(node failure within MTTR) = 0.00007 same as you. I extrapolated this to the outage scenario P(at least two node failures within MTTR) = P(node failure within MTTR)^2 * (1-P(node failure within MTTR)) + P(node failure within MTTR)^3 = 5.09 * 10^-9 which rounds to 0.0000001%.
reply
maxenglander
2 days ago
[-]
Hi n_u, PlanetScale engineer here, I'm going to just address just the point about durability via replication. I can't speak to what you've seen with other distributed systems, but, at PlanetScale, we don't do replication instead of writing to disk, we do replication in addition to writing to disk. Best of both worlds.
reply
rcrowley
2 days ago
[-]
Good point, Max. I glossed over the "rather than" bit. We do, as you say, write to disks all over the place.

Even writing to one disk, though, isn't good enough. So we write to three and wait until two have acknowledged before we acknowledge that write to the client.

reply
alexeldeib
2 days ago
[-]
can't speak to GCP specifically but usually the issue is they are host-attached and can't be migrated, so need to be wiped on VM termination or migration -- that's when you lose data.

Reboots typically don't otherwise do anything special unless they also trigger a host migration. GCP live migration has some mention of support though

GCP mentions data persists across reboots here https://cloud.google.com/compute/docs/disks/local-ssd#data_p...

note that stop/terminate via cloud APIs usually releases host capacity for other customers and would trigger data wipe, a guest initiated reboot typically will not.

reply
commandersaki
2 days ago
[-]
This title is very confusing; no company is affiliated with the release of postgresql.
reply
munns
2 days ago
[-]
Thanks for that callout, absolutely right. Fixed!
reply
samlambert
2 days ago
[-]
corrected. thank you
reply
fosterfriends
2 days ago
[-]
Congrats on the launch Sam! Excited to try it out for Graphite's production DB
reply
didip
2 days ago
[-]
If you are on AWS anyway, I am curious why not just use Aurora Postgres?
reply
primitivesuave
2 days ago
[-]
I use Aurora Postgres at work, where we pay approximately 9x more for equivalent resources to PlanetScale (according to their pricing page [1]). This is not an endorsement of PlanetScale as I've never used it, just pointing out that the premium for using Aurora Postgres is many multiples higher than virtually every other Postgres provider.

1. https://planetscale.com/pricing?architecture=x86-64&cluster=...

reply
samlambert
2 days ago
[-]
Please reach out to us we would love to talk to you about testing PlanetScale and seeing if we can drive your costs down and performance up.
reply
achristmascarl
2 days ago
[-]
I haven't used PlanetScale before, but AWS Aurora limits IOPS and network performance based on your instance size, so you end up in scenarios where you really wish you had more throughput, but sizing up your instance would be a very, very expensive solution
reply
awaseem
2 days ago
[-]
Might be a dumb question, but what is metal? Are you folks hosting DBs on your own infra or still going through AWS/GCP
reply
mattrobenolt
2 days ago
[-]
It's still AWS/GCP, but it uses instance types with local NVMes.
reply
arandomhuman
2 days ago
[-]
so it's a "bare metal" virtual machine? Or are they actually using the bare metal offerings for the cloud provider?
reply
bddicken
2 days ago
[-]
Still in virtual machines, but ones with local NVMe drives rather than network-attached storage (EBS, Persistent Disk). This means incredible I/O performance.

https://planetscale.com/blog/benchmarking-postgres

reply
samlambert
2 days ago
[-]
hosting on AWS/GCP on the ephemeral NVMe nodes. https://planetscale.com/metal
reply
n_u
2 days ago
[-]
A couple dumb questions:

1. You say "ephemeral", but my understanding is that NVMe is non-volatile so upon crash and restart we should be able to recover the state of the memory. Is is ephemeral because of how EC2 works where you might not get that same physical box and memory addresses back?

2. Can you explain what "Semi-synchronous replication" is? Your docs say "This ensures every write has reached stable storage in two availability zones before it’s acknowledged to the client." but I would call that synchronous since the write is blocked until it is replicated.

Thanks!

reply
maxenglander
2 days ago
[-]
Hi n_u,

When we say ephemeral we mean that if the host compute dies in a permanent way (which happens from time to time) the data on the NVMe drives attached to that host is not recoverable by us. AWS/GCP might have recovery mechanisms internally it, but we don't have API access to those APIs.

When we say "semi-synchronous replication" we mean it in the sense of MySQL semi-synchronous replication: https://dev.mysql.com/doc/refman/8.4/en/replication-semisync.... To be honest I'm not exactly sure where the "semi" comes from but here are two possible reasons I can think of why:

1. We actually only require that 1 of the 2 replicas sends an acknowledgement to the primary that it has durably stored the transaction to its relay log before the primary in turn sends an acknowledgement back to the client application. 2. The transaction is visible (can be read) on both the primary and the replica _before_ the primary sends back an acknowledgement that the transaction was committed back to the client application.

reply
n_u
2 days ago
[-]
Thanks! I see. It's maybe a term they came up with to place it between async and fully synchronous replication.
reply
rcrowley
2 days ago
[-]
I think we've got (1) covered elsewhere in the comment tree.

For (2), semi-synchronous replication is a MySQL term which we realize in Postgres is by using synchronous replication with ANY one of the available replicas acknowledging the write. This allows us to guarantee durability in two availability zones before acknowledging writes to clients.

In MySQL the _semi_ part of semi-synchronous replication refers to the write only needing to be written to the binary log on the replica and not (necessarily) applied to InnoDB. This is why a MySQL database might be both acknowledging semi-synchronous writes and reporting non-zero replication lag.

reply
n_u
2 days ago
[-]
> write only needing to be written to the binary log on the replica and not (necessarily) applied to InnoDB.

Ah. I wonder are writes in the log but not yet in InnoDB are available for reads? Then your write may succeed but a subsequent read from a replica would not see it so you lose read-after-write consistency. Perhaps that's another tradeoff.

I'll have to research a bit more but the MySQL docs [1] say "requires only an acknowledgement from the replicas, not that the events have been fully executed and committed on the replica side" which implies that it can't be read yet.

Thanks!

[1] https://dev.mysql.com/doc/refman/8.4/en/replication-semisync...

reply
rcrowley
1 day ago
[-]
A lagging replica, even one that just acknowledged a semi-sync write, will return stale results if you route a `SELECT` to it.

First and foremost, the extra copies of the data are for fault tolerance. In specific circumstances they may offer some slack capacity that you can use to serve (potentially stale) reads but they're never going to offer read-your-writes consistency.

The docs you quote are a bit obtuse but the "acknowledgement" is the row event being written to the binary log. "Fully executed and committed" is when it makes its way into InnoDB and becomes available for future reads.

reply
n_u
1 day ago
[-]
Ah yeah that makes sense. I guess if you are only using replica for fault tolerance and not as a read-replica then it seems semi-synchronous replication is strictly better than synchronous replication. I suppose in semi-sync the failover will take a little longer but it's probably inconsequential.

Thanks!

reply
Imustaskforhelp
1 day ago
[-]
I really wish that the hobby tier hadn't gone but I also understand that planetscale is a b2b which imo I can respect yet still wish if I can try things in a hobby tier...

I read the comments and it seems that in one of them they mention between supabase vs planetscale postgres that maybe they can use a project like supabase and then come to planetscale when their project grows enough to support that decision.

How would a migration from supabase to planetscale even go and at what scale would something like that be remotely better i suppose.

Great project tho and I hope that planetscale's team doesn't get bored listening to all requests asking for a free tier like me, maybe I am addicted on those sweet freebies!

reply
samlambert
1 day ago
[-]
we've seen a number of Supabase -> PlanetScale migrations and it's been pretty simple with significant cost savings for the customer. The scale part of this is hard to answer because it really depends on the workload.
reply
Imustaskforhelp
1 day ago
[-]
Congrats on the launch and thanks for responding!

I will try to create a product one day that will have a supabase -> planetscale migration one day to know that I have made it lol (jk)

have a nice day

reply
gtowey
1 day ago
[-]
Hey Sam! Congrats on the launch!
reply
boxed
2 days ago
[-]
Can you control where your application runs so that you don't have a ton of latency between this thing and the app? Seems to me like that could destroy a lot of the supposed gains...
reply
sgammon
2 days ago
[-]
yes, you can pick a cloud provider and region, and you can deploy replicas to other regions
reply
boxed
1 day ago
[-]
That's a bit vague isn't it? Certainly that does not mean putting something guaranteed even in the same data center, let alone the same rack.
reply
t_sawyer
1 day ago
[-]
PlanetScale created a business and profited off of an open source product called vitess from Google which is why they originally only supported mysql. Would love for them to open source their solution for postgres.
reply
ksec
1 day ago
[-]
Planetscale's Neki, Postgres 18, OrioleDB. Give it ~3 more years may be we can finally leave MySQL behind unless Oracle decide to do a 180 degree U turn.
reply
yohbho
2 days ago
[-]
Did they rename to GA, did a company named GA buy them, or are they general availability, i.e. 1.0 out, or "not closed beta" ?

Ah, overlooked first sentence, read only all headings and navigation and footer:

> is now generally available and out of private preview

reply
jerrygoyal
1 day ago
[-]
We use Supabase Postgres for our production database. What are the pros and cons of switching to PlanetScale Postgres?
reply
gigatexal
2 days ago
[-]
this is still using the OLTP engine though right? Can you use planetscale Postgres with any of the OLAP backends? Can I install the duckdb extension and get OLAP for free plus all the planetscale goodness?
reply
sgammon
2 days ago
[-]
every time i use planetscale it works flawlessly. great news!
reply
phplovesong
2 days ago
[-]
The elephant in the room is you cant use FKs when you go planetscale. This has a huge impact on how you design your database.
reply
samlambert
2 days ago
[-]
Not true and has never been true for Postgres. Vitess was like this for a while but we shipped FKs for Vitess 2 years ago.
reply