You can still get the compression benefits by serving files with Content-Encoding: gzip or whatever. Though it has builtin compression, you can just not use that and use external compression instead, especially over the wire.
It's pretty widely used, though often dressed up as something else. JAR files or APK files or whatever.
I think the articles complaints about lacking unix access rights and metadata is a bit strange. That seems like a feature more than a bug, as I wouldn't expect this to be something that transfers between machines. I don't want to unpack an archive and have to scrutinize it for files with o+rxst permissions, or have their creation date be anything other than when I unpacked them.
> I don't want to unpack an archive and have to scrutinize it for files with o+rxst permissions, or have their creation date be anything other than when I unpacked them.
I'm the opposite, when I pack and unpack something, I want the files to be identical including attributes. Why should I throw away all the timestamps, just because the file were temporarily in an archive?
In case anyone is unaware, you don't have to throw away all the timestamps when using "zip with no compression". The metadata for each zipped file includes one timestamp (originally rounded to even number of seconds in local time).
I am a big last modified timestamp fan and am often discouraged that scp and git are not (by default).
If your archive drops it you can't get it back.
If you don't want it you can just chmod -R u=rw,go=r,a-x
Hence, the common archive format is tar not zip.
SquashFS with zstd compression is used by various container runtimes, and is popular in HPC where filesystems often have high latency. It can be mounted natively or with FUSE, and the decompression overhead is not really felt.
Stupid question: why can't we get a syscall to load an entire directory into an array of file descriptors (minus an array of paths to ignore), instead of calling open() on every individual file in that directory? Seems like the simplest solution, no?
You could use io_uring but IMO that API is annoying and I remember hitting limitations. One thing you could do with io_uring is using openat (the op not the syscall) with the dir fd (which you get from the syscall) so you can asynchronously open and read files, however, you couldn't open directories for some reason. There's a chance I may be remembering wrong
Otherwise you can open a dir and pass its fd to openat together with a relative path to a file, to reduce the kernel overhead of resolving absolute paths for each file.
You mean like a range of file descriptors you could use if you want to save files in that directory?
For example, our integration test suite on a particular service has become quite slow, but it's not particularly clear where the time is going. I suspect a decent amount of time is being spent talking to postgres, but I'd like a low touch way to profile this
Strictly speaking, the bottleneck was latency, not bandwidth.
https://github.com/golang/go/issues/28739#issuecomment-10426...
https://stackoverflow.com/questions/64656255/why-is-the-c-fu...
https://github.com/valhalla/valhalla/issues/1192
https://news.ycombinator.com/item?id=13628320
Not sure what's the root cause, though.
compressing the kernel loads it faster on RAM even if it still has to execute the un compressing operation. Why?
Load from disk to RAM is a larger bottleneck than CPU uncompressing.
Same is applied to algorithms, always find the largest bottleneck in your dependent executions and apply changes there as the rest of the pipeline waits for it. Often picking the right algorithm “solves it” but it may be something else, like waiting for IO or coordinating across actors (mutex if concurrency is done as it used to).
That’s also part of the counterintuitive take that more concurrency brings more overhead and not necessarily faster execution speeds (topic largely discussed a few years ago with async concurrency and immutable structures).