Unix files have (at least) two sizes
68 points
4 days ago
| 13 comments
| utcc.utoronto.ca
| HN
ChuckMcM
2 days ago
[-]
A very long time ago when I was at Sun we struggled with this. Sun added something called the 'translucent file system' among other things. There was briefly a thought of changing stat so that it would report "data" being held by the file (all of the things written to it), "extents" of the file (all of the blocks it was taking up on disk), and "size" which accounted for partial blocks and was backwards compatible, and "flags" which included "encrypted" (yes/no), "sparse" (yes/no), "fixed" (yes/no), and at least one and possibly others that I can't remember. It served to illustrate how complex the question of file systems had become beyond the original "SAM" and "ISAM" methods from mainframes. It was either Guy Harris or Jon Livesey who pointed out that checking a damaged version of a file system with all of those capabilities as first class attributes was probably not NP complete. :-).
reply
delusional
2 days ago
[-]
> checking a damaged version of a file system with all of those capabilities as first class attributes was probably not NP complete. :-).

I don't think I understand this statement. Did you mean it was undecidable, that it's NP complete, or did you mean it was solvable in polynomial time? The latter is what it currently parses as for me, but also what makes the least sense in the context of the comment.

reply
cryptonector
2 days ago
[-]
stat() is a dumpster. It causes serious performance problems for things like Lustre. It should get replaced with a system call that takes a set of metadata items you want and returns just those items. Of course, that's easy to say, but it will take forever and a day to make sure all things are using it that should.
reply
nolist_policy
2 days ago
[-]
Linux has statx() since 8 years.

> The mask argument to statx() is used to tell the kernel which fields the caller is interested in.

https://manpages.debian.org/bookworm/manpages-dev/statx.2.en...

reply
cryptonector
1 day ago
[-]
Ah, interesting, TIL. Thanks!
reply
garaetjjte
2 days ago
[-]
>However, it does leave us with no way of finding out the logical block size, which we may care about for various reasons

There's SEEK_HOLE and fiemap ioctl.

reply
ryao
2 days ago
[-]
SEEK_HOLE would let you find the holes, but not tell you the record size, which is what he wants here. ZFS does not implement .fiemap.

zdb could be used to get this information by inspecting the dataset's object corresponding to the inode number returned by stat, but that is a heavy handed way to find it.

If this exporting this information is important to the original author, I suggest that he file an issue at the OpenZFS bug tracker requesting a way of exporting this information.

reply
gerdesj
2 days ago
[-]
Many years ago I looked after a Novell cluster of three hosts with a rather expensive FC connected array. So what - that's pretty normal?

It was the early noughties and a TB was expensive. I wrote a spreadsheet with inputs from the Novell .ocx jobbies. The files were stored on some Novell NSS volu.mes.

I was able to show all states of the files and aggregate stats too.

Nowadays a disc is massive and worrying about compression is daft

reply
yjftsjthsd-h
2 days ago
[-]
> Nowadays a disc is massive and worrying about compression is daft

I wouldn't go that far. I've professionally seen storage pools with a compression factor of 2-3x, and it really mattered at that job. For that matter, my home directory on the laptop I'm writing this comment from is sitting around 1.2-1.3x, and that's pretty nice. I dunno if I'd make a whole lot of effort (although if I was getting paid to save money on storing terabytes, it might be worthwhile), but the technology has evolved in ease of use.

reply
Dylan16807
2 days ago
[-]
> Nowadays a disc is massive and worrying about compression is daft

I wish. Not even because I want to store more data, but because that would imply going the other way is super cheap. Make RAID-1 the standard, with a filesystem that keeps snapshots for multiple months. But we're not at the point where that costs a trivial amount.

reply
eru
2 days ago
[-]
> But we're not at the point where that costs a trivial amount.

It depends on your data.

For (generalised) text files or even Word documents, we have been at that point for quite a while.

reply
Dylan16807
1 day ago
[-]
I'll believe it when I see it as an option in a large fraction of prebuilt computers.
reply
eru
1 day ago
[-]
That's exactly how Google Docs works (and whatever Microsoft's Office web equivalent is called).

So we are already living in that world.

reply
Dylan16807
13 hours ago
[-]
If you use those services you often have zero local copies, so that's not really what I meant.

And more importantly, once you have local RAID you can put all your important files on it. That's a critical part of the world I want to see.

reply
eru
3 hours ago
[-]
> If you use those services you often have zero local copies, so that's not really what I meant.

Yes, I was merely using these examples to show that storage space is cheap enough. Google and Microsoft store your stuff on HDD and SSD and tape, too. They ain't magic.

reply
ndsipa_pomu
2 days ago
[-]
> Nowadays a disc is massive and worrying about compression is daft

Enabling compression can also increase performance (decompressing is quick compared to reading from the disks) so disk size isn't the only reason to enable it.

reply
zabzonk
2 days ago
[-]
Going back to the old days, CP/M had at least or actually one size - the number of blocks allocated to the file - managing the actual useful storage was the job of the application using the file. Though I must admit that CP/M was hardly an OS.
reply
degamad
2 days ago
[-]
Isn't that true of almost all operating systems? Isn't that why Windows Explorer has "Size" and "Size on disk" as well?

Lots of filesystems in multiple operating systems have had compression for a long time...

reply
wruza
2 days ago
[-]
The dilemma as old as stacker and {double,drive}space. This is just a leaking abstraction. Leave it be cause there's no good solution for it. The best place for zfs-aware backup code is in a backup tool that cares.
reply
1970-01-01
2 days ago
[-]
It's not just UNIX. The cluster size (block size in UNIX) is the smallest unit of size any file system can reference when accessing disk storage.
reply
Someone
2 days ago
[-]
Not any file system https://en.wikipedia.org/wiki/Block_suballocation:

“Block suballocation is a feature of some computer file systems which allows large blocks or allocation units to be used while making efficient use of empty space at the end of large files, space which would otherwise be lost for other use to internal fragmentation.

In file systems that don't support fragments, this feature is also called tail merging or tail packing because it is commonly done by packing the "tail", or last partial block, of multiple files into a single block.

As of 2015, the most widely used read-write file systems with support for block suballocation are Btrfs and FreeBSD UFS2[4] (where it is called "block level fragmentation"). ReiserFS and Reiser4 also support tail packing.”

Also (same article):

“Several read-only file systems do not use blocks at all and are thus implicitly using space as efficiently as suballocating file systems; such file systems double as archive formats.”

There also are filesystems that store the content of very small files, together with file metadata, not in blocks allocated for the file.

reply
smitty1e
2 days ago
[-]
> ZFS opts to report the physical block size of the file, which is probably the more useful number for the purposes of things like 'du'. However, it does leave us with no way of finding out the logical block size

If the filesystem is tantamount to a big fat .zip, then perhaps there is a requirement for a manifest of the logical file sizes somewhere.

Somehow I doubt that this problem hasn't already been solved.

reply
timewizard
2 days ago
[-]
That problem isn't knowing this information at the fs level, it's conveying it to programs through the stat(3) interface, and the historical structure we're stuck with.
reply
o11c
2 days ago
[-]
Note that the ZFS choice will break tools that check for sparseness and assume it's a binary file. Off the top of my head, the most famous breakage was GNU grep ... 2.14?
reply
ryao
2 days ago
[-]
It was GNU grep 2.13 and ZFS' behavior predates GNU grep 2.13 by many years. This was considered to be a bug in grep by the grep developers:

https://github.com/openzfs/zfs/issues/829

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=2f0255e...

reply
ryao
2 days ago
[-]
The moment I saw this title, I imagined this on ZFS:

  $ truncate -s 1G testfile
  $ du testfile
  1       testfile
  $ du -b testfile
  1073741824      testfile
reply
ndsipa_pomu
2 days ago
[-]
That's a sparse file - you'd get the same on just about any Linux filesystem regardless of compression
reply
ryao
1 day ago
[-]
It shows the two different file sizes.
reply
msephton
2 days ago
[-]
On macOS you can see both if you do Get Info on a file
reply
its-summertime
2 days ago
[-]
symbolic links in ext4 can be stored in the inode data, meaning zero bytes in the file representing the symbolic link itself (of course, the inode data is bigger as a result(?))

small files in btrfs can be stored in the metadata blocks instead of data blocks

reply
Dylan16807
2 days ago
[-]
I think you mean the actual size, and in that case no it's not bigger. Inodes are a fixed size, usually 256 bytes, and a file strictly has one inode. The only growth happens in data blocks.

It's also worth looking at NTFS, where a file can have multiple records, the equivalent of inodes. But it uses the same logic for tiny files. If it doesn't fit inside the base record, it goes into a data block. Multiple records only show up for large amounts of metadata.

reply
m463
1 day ago
[-]
site won't load - accuses me of using a suspiciously old browser. lol
reply