Case study: recovery of a corrupted 12 TB multi-device pool
64 points
7 hours ago
| 4 comments
| github.com
| HN
yjftsjthsd-h
4 hours ago
[-]
> This is not a bug report. [...] The goal is constructive, not a complaint.

Er, I appreciate trying to be constructive, but in what possible situation is it not a bug that a power cycle can lose the pool? And if it's not technically a "bug" because BTRFS officially specifies that it can fail like that, why is that not in big bold text at the start of any docs on it? 'Cuz that's kind of a big deal for users to know.

EDIT: From the longer write-up:

> Initial damage. A hard power cycle interrupted a commit at generation 18958 to 18959. Both DUP copies of several metadata blocks were written with inconsistent parent and child generations.

Did the author disable safety mechanisms for that to happen? I'm coming from being more familiar with ZFS, but I would have expected BTRFS to also use a CoW model where it wasn't possible to have multiple inconsistent metadata blocks in a way that didn't just revert you to the last fully-good commit. If it does that by default but there's a way to disable that protection in the name of improving performance, that would significantly change my view of this whole thing.

reply
rincebrain
3 hours ago
[-]
As far as I can see, no, the author disabled nothing of the sort that he documented.

I suspect that the author's intent is less "I do not view this as a bug" and more "I do not think it's useful to get into angry debates over whether something is a bug". I do not know whether this is a common thing on btrfs discussions, but I have certainly seen debates to that effect elsewhere.

(My personal favorite remains "it's not a data loss bug if someone could technically theoretically write something to recover the data". Perhaps, technically, that's true, but if nobody is writing such a tool, nobody is going to care about the semantics there.)

reply
yjftsjthsd-h
2 hours ago
[-]
> I suspect that the author's intent is less "I do not view this as a bug" and more "I do not think it's useful to get into angry debates over whether something is a bug".

Agreed, and I appreciate the attempt to channel things into a productive conversation.

reply
Retr0id
2 hours ago
[-]
Unless I missed it the writeup never identifies a causal bug, only things that made recovery harder.
reply
rcxdude
1 hour ago
[-]
btrfs's reputation is not great in this regard.
reply
Retr0id
3 hours ago
[-]
This is obviously LLM output, but perhaps LLM output that corresponds to a real scenario. It's plausible that Claude was able to autonomously recover a corrupted fs, but I would not trust its "insights" by default. I'd love to see a btrfs dev's take on this!
reply
number6
2 hours ago
[-]
This is also my first impulse. The second was, if this happened to me, I would not be able to recover it. All the custom c tool talk... If you ask Claude Code it will code something up.

Well that he recovered the disks is amazing in itself. I would have given up and just pulled a backup.

However, I would like to see a Dev saying: why didn't you use the --<flag> which we created for this Usecase

reply
yjftsjthsd-h
2 hours ago
[-]
I was assuming real scenario with heavy LLM help to recover. Would be nice for the author to clarify. And, separately, for BTRFS devs to weigh in, though I'd somewhat prefer to get some indication that it's real before spending their time.
reply
nslsm
2 hours ago
[-]
An LLM wouldn't make a mistake like "One paragraph summary"
reply
stinkbeetle
4 hours ago
[-]
> Case study: recovery of a severely corrupted 12 TB multi-device pool, plus constructive gap analysis and reference tool set #1107

Please don't be btrfs please don't be btrfs please don't be btrfs...

reply
toaste_
2 hours ago
[-]
I mean, the only other option was bcachefs, which might have been funny if this LLM-generated blogpost were written by the OpenClaw instance the developer has decided is sentient:

https://www.reddit.com/r/bcachefs/comments/1rblll1/the_blog_...

But no. It was btrfs.

As a side note, it's somewhat impressive that an LLM agent was able to produce a suite of custom tools that were apparently successfully used to recover some data from a corrupted btrfs array, even ad-hoc.

reply
yjftsjthsd-h
2 hours ago
[-]
It could be ZFS. I'd be much more surprised, but it can still have bugs.
reply
praseodym
2 hours ago
[-]
ZFS on Linux has had many bugs over the years, notably with ZFS-native encryption and especially sending/receiving encrypted volumes. Another issue is that using swap on ZFS is still guaranteed to hang the kernel in low memory scenarios, because ZFS needs to allocate memory to write to swap.
reply
badgersnake
1 hour ago
[-]
The zero copy that zero copied unencrypted blocks onto encrypted file systems was genius. It’s almost like they don’t test.
reply
phoronixrly
4 hours ago
[-]
To theal author: did you continue using btrfs after this ordeal? An FS that will not eat (all) your data upon a hard powercycle only at the cost of 14 custom C tools is a hard pass from me no matter how many distros try to push it down my throat as 'production-ready'...

Also, impressive work!

reply
fpoling
1 hour ago
[-]
What are the alternatives to btrfs? At 12 TB data checksums are a must unless the data tolerate bit-rot. And if one wants to stick with the official kernel without out-of-tree modules, btrfs is the only choice.
reply
aktau
25 minutes ago
[-]
I tried btrfs on three different occasions. Three times it managed to corrupt itself. I'll admit I was too enthousiastic the first time, trying it less than a year after it appeared in major distros. But the latter two are unforgiveable (I had to reinstall my mom's laptop).

I've been using ZFS for my NAS-like thing since then. It's been rock solid ().

(): I know about the block cloning bug, and the encryption bug. Luckily I avoided those (I don't tend to enable new features like block cloning, and I didn't have an encrypted dataset at the time). Still, all in all it's been really good in comparison to btrfs.

reply
egorfine
38 minutes ago
[-]
> if one wants to stick with the official kernel without out-of-tree modules

I wonder how could a requirement like that possibly arise. Especially with an obvious exception for zfs.

reply
ThatPlayer
22 minutes ago
[-]
Bcachefs also fulfills the requirement of checksums (and multi device support).

Also out of tree.

reply
phoronixrly
21 minutes ago
[-]
Does it not also eat data though?
reply
phoronixrly
22 minutes ago
[-]
lvm offers lvmraid, integrity, and snapshots as one example. It's old unsexy tech, but losing data is not to my taste lately...
reply
Joel_Mckay
28 minutes ago
[-]
Could try ZFS or CephFS... even if several host roles are in VM containers (45Drives has a product setup that way.)

The btrfs solution has a mixed history, and had a lot of the same issues DRBD could get. They are great until some hardware/kernel-mod eventually goes sideways, and then the auto-heal cluster filesystems start to make a lot more sense. Note, with cluster based complete-file copy/repair object features the damage is localized to single files at worst, and folks don't have to wait 3 days to bring up the cluster on a crash.

Best of luck, =3

reply