FilterHN

ChuckMcM

2 months ago

[-]

A very long time ago when I was at Sun we struggled with this. Sun added something called the 'translucent file system' among other things. There was briefly a thought of changing stat so that it would report "data" being held by the file (all of the things written to it), "extents" of the file (all of the blocks it was taking up on disk), and "size" which accounted for partial blocks and was backwards compatible, and "flags" which included "encrypted" (yes/no), "sparse" (yes/no), "fixed" (yes/no), and at least one and possibly others that I can't remember. It served to illustrate how complex the question of file systems had become beyond the original "SAM" and "ISAM" methods from mainframes. It was either Guy Harris or Jon Livesey who pointed out that checking a damaged version of a file system with all of those capabilities as first class attributes was probably not NP complete. :-).

delusional

2 months ago

[-]

> checking a damaged version of a file system with all of those capabilities as first class attributes was probably not NP complete. :-).

I don't think I understand this statement. Did you mean it was undecidable, that it's NP complete, or did you mean it was solvable in polynomial time? The latter is what it currently parses as for me, but also what makes the least sense in the context of the comment.

cryptonector

2 months ago

[-]

stat() is a dumpster. It causes serious performance problems for things like Lustre. It should get replaced with a system call that takes a set of metadata items you want and returns just those items. Of course, that's easy to say, but it will take forever and a day to make sure all things are using it that should.

https://manpages.debian.org/bookworm/manpages-dev/statx.2.en...

nolist_policy

2 months ago

[-]

Linux has statx() since 8 years.

> The mask argument to statx() is used to tell the kernel which fields the caller is interested in.

cryptonector

2 months ago

[-]

Ah, interesting, TIL. Thanks!

garaetjjte

2 months ago

[-]

>However, it does leave us with no way of finding out the logical block size, which we may care about for various reasons

There's SEEK_HOLE and fiemap ioctl.

2 months ago

[-]

SEEK_HOLE would let you find the holes, but not tell you the record size, which is what he wants here. ZFS does not implement .fiemap.

zdb could be used to get this information by inspecting the dataset's object corresponding to the inode number returned by stat, but that is a heavy handed way to find it.

If this exporting this information is important to the original author, I suggest that he file an issue at the OpenZFS bug tracker requesting a way of exporting this information.

gerdesj

2 months ago

[-]

Many years ago I looked after a Novell cluster of three hosts with a rather expensive FC connected array. So what - that's pretty normal?

It was the early noughties and a TB was expensive. I wrote a spreadsheet with inputs from the Novell .ocx jobbies. The files were stored on some Novell NSS volu.mes.

I was able to show all states of the files and aggregate stats too.

Nowadays a disc is massive and worrying about compression is daft

yjftsjthsd-h

2 months ago

[-]

> Nowadays a disc is massive and worrying about compression is daft

I wouldn't go that far. I've professionally seen storage pools with a compression factor of 2-3x, and it really mattered at that job. For that matter, my home directory on the laptop I'm writing this comment from is sitting around 1.2-1.3x, and that's pretty nice. I dunno if I'd make a whole lot of effort (although if I was getting paid to save money on storing terabytes, it might be worthwhile), but the technology has evolved in ease of use.

2 months ago

[-]

> Nowadays a disc is massive and worrying about compression is daft

I wish. Not even because I want to store more data, but because that would imply going the other way is super cheap. Make RAID-1 the standard, with a filesystem that keeps snapshots for multiple months. But we're not at the point where that costs a trivial amount.

2 months ago

[-]

> But we're not at the point where that costs a trivial amount.

It depends on your data.

For (generalised) text files or even Word documents, we have been at that point for quite a while.

2 months ago

[-]

I'll believe it when I see it as an option in a large fraction of prebuilt computers.

2 months ago

[-]

That's exactly how Google Docs works (and whatever Microsoft's Office web equivalent is called).

So we are already living in that world.

2 months ago

[-]

If you use those services you often have zero local copies, so that's not really what I meant.

And more importantly, once you have local RAID you can put all your important files on it. That's a critical part of the world I want to see.

2 months ago

[-]

> If you use those services you often have zero local copies, so that's not really what I meant.

Yes, I was merely using these examples to show that storage space is cheap enough. Google and Microsoft store your stuff on HDD and SSD and tape, too. They ain't magic.

2 months ago

[-]

The "stuff" they store is not enough for me to have a hard drive failure but keep all my data. So it's not cheap enough imo.

Even for text files, the only software I trust to find them all when set up by a normal person is something like backblaze and that's a good chunk of money. And for the rest of the data you need to pay significantly for the space no matter how you do it.

2 months ago

[-]

For most people, apart from text files / documents there's perhaps photos and videos that they took that really take up space, and that can't be replaced.

Eg installed Steam games take up oodles of space, but you don't need to back them up.

ndsipa_pomu

2 months ago

[-]

> Nowadays a disc is massive and worrying about compression is daft

Enabling compression can also increase performance (decompressing is quick compared to reading from the disks) so disk size isn't the only reason to enable it.

zabzonk

2 months ago

[-]

Going back to the old days, CP/M had at least or actually one size - the number of blocks allocated to the file - managing the actual useful storage was the job of the application using the file. Though I must admit that CP/M was hardly an OS.

degamad

2 months ago

[-]

Isn't that true of almost all operating systems? Isn't that why Windows Explorer has "Size" and "Size on disk" as well?

Lots of filesystems in multiple operating systems have had compression for a long time...

wruza

2 months ago

[-]

The dilemma as old as stacker and {double,drive}space. This is just a leaking abstraction. Leave it be cause there's no good solution for it. The best place for zfs-aware backup code is in a backup tool that cares.

1970-01-01

2 months ago

[-]

It's not just UNIX. The cluster size (block size in UNIX) is the smallest unit of size any file system can reference when accessing disk storage.

Someone

2 months ago

[-]

Not any file system https://en.wikipedia.org/wiki/Block_suballocation:

“Block suballocation is a feature of some computer file systems which allows large blocks or allocation units to be used while making efficient use of empty space at the end of large files, space which would otherwise be lost for other use to internal fragmentation.

In file systems that don't support fragments, this feature is also called tail merging or tail packing because it is commonly done by packing the "tail", or last partial block, of multiple files into a single block.

As of 2015, the most widely used read-write file systems with support for block suballocation are Btrfs and FreeBSD UFS2[4] (where it is called "block level fragmentation"). ReiserFS and Reiser4 also support tail packing.”

Also (same article):

“Several read-only file systems do not use blocks at all and are thus implicitly using space as efficiently as suballocating file systems; such file systems double as archive formats.”

There also are filesystems that store the content of very small files, together with file metadata, not in blocks allocated for the file.

smitty1e

2 months ago

[-]

> ZFS opts to report the physical block size of the file, which is probably the more useful number for the purposes of things like 'du'. However, it does leave us with no way of finding out the logical block size

If the filesystem is tantamount to a big fat .zip, then perhaps there is a requirement for a manifest of the logical file sizes somewhere.

Somehow I doubt that this problem hasn't already been solved.

timewizard

2 months ago

[-]

That problem isn't knowing this information at the fs level, it's conveying it to programs through the stat(3) interface, and the historical structure we're stuck with.

o11c

2 months ago

[-]

Note that the ZFS choice will break tools that check for sparseness and assume it's a binary file. Off the top of my head, the most famous breakage was GNU grep ... 2.14?

https://github.com/openzfs/zfs/issues/829

2 months ago

[-]

It was GNU grep 2.13 and ZFS' behavior predates GNU grep 2.13 by many years. This was considered to be a bug in grep by the grep developers:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=2f0255e...

2 months ago

[-]

The moment I saw this title, I imagined this on ZFS:

  $ truncate -s 1G testfile
  $ du testfile
  1       testfile
  $ du -b testfile
  1073741824      testfile

ndsipa_pomu

2 months ago

[-]

That's a sparse file - you'd get the same on just about any Linux filesystem regardless of compression

2 months ago

[-]

It shows the two different file sizes.

msephton

2 months ago

[-]

On macOS you can see both if you do Get Info on a file

its-summertime

2 months ago

[-]

symbolic links in ext4 can be stored in the inode data, meaning zero bytes in the file representing the symbolic link itself (of course, the inode data is bigger as a result(?))

small files in btrfs can be stored in the metadata blocks instead of data blocks

2 months ago

[-]

I think you mean the actual size, and in that case no it's not bigger. Inodes are a fixed size, usually 256 bytes, and a file strictly has one inode. The only growth happens in data blocks.

It's also worth looking at NTFS, where a file can have multiple records, the equivalent of inodes. But it uses the same logic for tiny files. If it doesn't fit inside the base record, it goes into a data block. Multiple records only show up for large amounts of metadata.