Corrupting a ZFS File on Purpose
36 points
by zdw
1 day ago
| 2 comments
| oshogbo.com
| HN
ralferoo
10 hours ago
[-]
Hmmm, it's been a long long time since I actually had a failed drive (and also I don't use zfs), but from what I remember of my last failing drive 20 years ago, the drive was able to detect that sectors had been corrupted, and then failed the read rather than just returning silently corrupted data. If my memory is correct, replacing random bytes on disk wouldn't actually reflect the typical way data corruption manifests itself.

I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.

But anyway, it might be worth testing by replacing some of the disk images with actually truncated ones so that there are holes when reading, so that it returns an actual read error rather than junk data.

reply
adrian_b
8 hours ago
[-]
The error-correcting codes used by HDDs/SSDs correct or detect the most frequent errors, but sometimes, when there are too many erroneous bits in a sector, they can mis-correct the data and then the HDD/SSD returns a corrupted sector without signaling any error.

I have seen this a few times on HDDs that had been used for the cold storage of archival data, for several years (around 5 years or even more). For each archive file, I had my own hash values that were used to detect corrupted files, which allowed me to detect all such cases. I had duplicates for all such HDDs. Sometimes both HDD copies had a few silent corrupted sectors, but they were not in the same locations, so in all cases I could recover the corrupted files from their duplicates. If I had stored the archival data without redundancy, I would have lost it.

If you do not use hashes or other error-detecting codes for all your files, like I do, you may have had some failures in your HDDs without recognizing them, but such errors are much more likely to happen in files that have been stored for many years.

reply
ramses0
1 hour ago
[-]
reply
anonymous_user9
14 hours ago
[-]
> The DVA was correct, the sector math was correct, the dd command was correct. The right place, the wrong mental model.

God the intensity is tiresome. Whether or not it's AI slop, it's also bad writing. Things can be fun or interesting or worthwhile without being a harrowing battle of discovery!

reply