I don't think I understand this statement. Did you mean it was undecidable, that it's NP complete, or did you mean it was solvable in polynomial time? The latter is what it currently parses as for me, but also what makes the least sense in the context of the comment.
> The mask argument to statx() is used to tell the kernel which fields the caller is interested in.
https://manpages.debian.org/bookworm/manpages-dev/statx.2.en...
There's SEEK_HOLE and fiemap ioctl.
zdb could be used to get this information by inspecting the dataset's object corresponding to the inode number returned by stat, but that is a heavy handed way to find it.
If this exporting this information is important to the original author, I suggest that he file an issue at the OpenZFS bug tracker requesting a way of exporting this information.
It was the early noughties and a TB was expensive. I wrote a spreadsheet with inputs from the Novell .ocx jobbies. The files were stored on some Novell NSS volu.mes.
I was able to show all states of the files and aggregate stats too.
Nowadays a disc is massive and worrying about compression is daft
I wouldn't go that far. I've professionally seen storage pools with a compression factor of 2-3x, and it really mattered at that job. For that matter, my home directory on the laptop I'm writing this comment from is sitting around 1.2-1.3x, and that's pretty nice. I dunno if I'd make a whole lot of effort (although if I was getting paid to save money on storing terabytes, it might be worthwhile), but the technology has evolved in ease of use.
I wish. Not even because I want to store more data, but because that would imply going the other way is super cheap. Make RAID-1 the standard, with a filesystem that keeps snapshots for multiple months. But we're not at the point where that costs a trivial amount.
It depends on your data.
For (generalised) text files or even Word documents, we have been at that point for quite a while.
So we are already living in that world.
And more importantly, once you have local RAID you can put all your important files on it. That's a critical part of the world I want to see.
Yes, I was merely using these examples to show that storage space is cheap enough. Google and Microsoft store your stuff on HDD and SSD and tape, too. They ain't magic.
Enabling compression can also increase performance (decompressing is quick compared to reading from the disks) so disk size isn't the only reason to enable it.
Lots of filesystems in multiple operating systems have had compression for a long time...
“Block suballocation is a feature of some computer file systems which allows large blocks or allocation units to be used while making efficient use of empty space at the end of large files, space which would otherwise be lost for other use to internal fragmentation.
In file systems that don't support fragments, this feature is also called tail merging or tail packing because it is commonly done by packing the "tail", or last partial block, of multiple files into a single block.
As of 2015, the most widely used read-write file systems with support for block suballocation are Btrfs and FreeBSD UFS2[4] (where it is called "block level fragmentation"). ReiserFS and Reiser4 also support tail packing.”
Also (same article):
“Several read-only file systems do not use blocks at all and are thus implicitly using space as efficiently as suballocating file systems; such file systems double as archive formats.”
There also are filesystems that store the content of very small files, together with file metadata, not in blocks allocated for the file.
If the filesystem is tantamount to a big fat .zip, then perhaps there is a requirement for a manifest of the logical file sizes somewhere.
Somehow I doubt that this problem hasn't already been solved.
https://github.com/openzfs/zfs/issues/829
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=2f0255e...
$ truncate -s 1G testfile
$ du testfile
1 testfile
$ du -b testfile
1073741824 testfile
small files in btrfs can be stored in the metadata blocks instead of data blocks
It's also worth looking at NTFS, where a file can have multiple records, the equivalent of inodes. But it uses the same logic for tiny files. If it doesn't fit inside the base record, it goes into a data block. Multiple records only show up for large amounts of metadata.