My impression is that this article has a lot of technical insight into how bzip compares to gzip, but it fails actually account for the real cause of the diminished popularity of bzip in favor of the non-gzip alternatives that it admits are the more popular choices in recent years.
Also making good progress on getting a slimmer version of zstd into the stdlib and improving the stdlib deflate.
Awesome! Please let me know if there is anything I can do to help
Does gmail use a special codec for storing emails ?
Yes, I do. Zstd is my preferred solution nowadays. But gzip is not going anywhere as a fallback because there is a surprisingly high number of computers without a working libzstd.
TIL. Now that's why gzip has a file header! But, tar.gz compresses even better, that's probably why it hasn't caught on.
that being said, speed is important for compression so for systems like webservers etc its an easy sell ofc. very strong point (and smarter implementation in programs) for gzip
Long comment to just say: ‘I have no idea about what I’m writing about’
These compression algorithms do not have anything to do with filesystem structure. Anyway the reason you can’t cat together parts of bzip2 but you can with zstd (and gzip) is because zstd does everything in frames and everything in those frames can be decompressed separately (so you can seek and decompress parts). Bzip2 doesn’t do that.
So like, another place bzip2 sucks ass is working with large archives because you need to seek the entire archive before you can decompress it and it makes situations without parity data way more likely to cause dataloss of the whole archive. Really, don’t use it unless you have a super specific use case and know the tradeoffs, for the average person it was great when we would spend the time compressing to save the time sending over dialup.
https://github.com/facebook/zstd?tab=readme-ov-file#benchmar...
In my own testing of compressing internal generic json blobs, I found brotli a clear winner when comparing space and time.
If I want higher compatibility and fast speeds, I'd probably just reach for gzip.
zstd is good for many use cases, too, perhaps even most...but I think just telling everyone to always use it isn't necessarily the best advice.
It’s slower and compresses less than zstd. gzip should only be reached for as a compatibility option, that’s the only place it wins, it’s everywhere.
EDIT: If you must use it, use the modern implementation, https://www.zlib.net/pigz/