Is DWPD Still a Useful SSD Spec?
28 points
by zdw
5 days ago
| 5 comments
| klarasystems.com
| HN
mdtancsa
2 hours ago
[-]
dropping off the bus is the best case fail really. Its more annoying when writes become slower than the other disks often causing confusing performance profiles of the overall array. Having good metrics for each disk (we use telegraf) will help flag it early. On my zfs pools, monitoring disk io for each disk, smartmon metrics help tease that out. For SSDs probably the worst is when there is some firmware bug that triggers on all disks at the same time. e.g. the infamous HP SSD Failure at 32,768 Hours of Use. Yikes!
reply
PunchyHamster
6 minutes ago
[-]
we had ones that turned into that failure mode at like 80% life left. Zero negative SMART metrics, just slowed down.

My hunch is that they don't expose anything because that makes it harder to refund on warranty

reply
mgerdts
2 hours ago
[-]
This article misses several important points.

- Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.

- Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block.

- A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs.

- Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive.

- per the NVMe spec there are indicators of drive health in the SMART log page.

- Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc.

reply
Aurornis
1 hour ago
[-]
You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.

> - Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.

This is true, but despite all of the controversy about this feature it’s hard to encounter this in practical consumer use patterns.

With the 980 Pro 1TB you can write 113GB before it slows down. (Source https://www.techpowerup.com/review/samsung-980-pro-1-tb-ssd/... ) So you need to be able to source that much data from another high speed SSD and then fill nearly 1/8th of the drive to encounter the slowdown. Even when it slows down you’re still writing at 1.5GB/sec. Also remember that the drive is factory overprovisioned so there is always some amount of space left to handle some of this burst writing.

For as much as this fact gets brought up, I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations, but even in slow mode you’re filling the entire drive capacity in under 10 minutes.

reply
nyrikki
1 hour ago
[-]
> You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.

This has always been the case, thus why even a decade ago the “pro” drives were odd sizes like 120g vs 128g.

Products like that still exist today and the problem tends to show up as drives age and that pool shrinks.

DWPD and TB written like modern consumer drives use are just different ways of communicating that contract.

FWIW I’d you do a drive wide discard and then only partition 90% of the drive you can dramatically improve the garbage collection slowdown on consumer drives.

In the world of ML and containers you can hit that if you say have fstrim scheduled once a week to avoid the cost of online discards.

I would rather have visibility into the size of the reserve space through smart, but I doubt that will happen.

reply
Havoc
3 hours ago
[-]
After getting burned by consumer drives I decided it’s time for a zfs array from used enterprise ssds. Tons of writes on them but full mirrored config and zfs is easier to backup so should be ok. And the really noisy stuff like logging im just sticking into optanes - those are 6+ dwpd depending on model which may as well be unlimited for personal use scenarios
reply
markhahn
1 hour ago
[-]
Text is wrong about CRCs: everyone uses pretty heavy ECC, so it's not just a re-read. This also provides a somewhat graduated measure of the block's actual health, so the housekeeping firmware can decide whether to stop using the block (ie, move the content elsewhere).

I'm also not a fan of buy bigger storage concept, or the conspiracy-theory on 480 v 512.

It sure would be nice if when considering a product, you could just look at some claimed stats from the vendor about time-related degradation, firmware sparing policy, etc. we shouldn't have to guess!

reply
saurik
1 hour ago
[-]
> I'm also not a fan of buy bigger storage concept, or the conspiracy-theory on 480 v 512.

I don't understand why this is being called a "conspiracy theory"; but, if you want some very concrete evidence that this is how they work, a paper was recently published that analyzed the behavior and endurance of various SSDs, and the data would be very difficult to describe using any other theory than that, comparing apples-to-apples on drives that have better write endurance, they are merely overprovisioned to allow the wear-level algorithm to not cause as much write amplification while reorganizing.

https://news.ycombinator.com/item?id=44985619

> OP on write-intensive SSD. SSD vendors often offer two versions of SSDs with similar hardware specifications, where the lower-capacity model is typically marketed as “write-optimized” or “mixed-use”. One might expect that such write-optimized SSDs would demonstrate improved WAF characteristics due to specialized internal designs. To investigate this, we compared two Micron SSD models: the Micron 7450 PRO, designed for “read-intensive” workloads with a capacity of 960 GB, and the Micron 7450 MAX, intended for “mixed-use” workloads with a capacity of 800 GB. Both SSDs were tested under identical workloads and dataset sizes, as shown in Figure 7b. The WAF results for both models were identical and closely matched the results from the simulator. This suggests that these Micron SSDs, despite being marketed for different workloads, are essentially identical in performance, with the only difference being a larger OP on the “mixed-use” model. For these SSD models, there appear to be no other hardware or algorithmic improvements. As a result, users can achieve similar performance by manually reserving free space on the “read-intensive” SSD, offering a practical alternative to purchasing the “mixed-use” model.

reply
igtztorrero
3 hours ago
[-]
The most common catastrophic failure you’ll see in SSDs: the entire drive simply drops off the bus as though it were no longer there.

Happened to me last week.

I just put it in a plastic bag into the freezer during 15 minutes, and works.

I made a copy to my laptop and then install a new server.

But not always works like charms.

Please always have a backup for documents, and a recent snapshot for critical systems.

reply
serf
3 hours ago
[-]
to be perfectly fair though, this isn't a new failure mode when SSDs arrived on the scene.

drive controllers on HDDs just suddenly go to shit and drop off buses, too.

I guess the difference being that people expect the HDD to fail suddenly whereas with a solid state device most people seem to be convinced that the failure will be graceful.

reply
PunchyHamster
2 minutes ago
[-]
We have a fleet of few hundred HDDs that is basically being replaced "on next failure" with SSD and that is BY FAR rarer on HDDs, maybe one out of 100 "just dies".

Usually it either starts returning media errors, or slows down (and if it is not replaced in time, slowing down drive usually turns into media error one).

SSDs (at least a big fleet of samsung ones we had) are much worse, just off, not even turning readonly. Of course we have redundancy so it's not really a problem, but if same happened on someone's desktop they'd be screwed if they don't have backups.

reply
lvl155
3 hours ago
[-]
Always make backups to HDD and cloud (and possibly tape if you are a data nut).
reply
zamadatix
3 hours ago
[-]
I don't think one should worry as much about what medias they are backing up to as if they are answering the question "does my data resiliency match my retention needs".

And regularly test restores actually work, nothing worse than thinking you had backups and then they don't restore right.

reply