CDC: Why Decompression Is Worth the Complexity
2 points
1 hour ago
| 1 comment
| wael.nasreddine.com
| HN
kalbasit
1 hour ago
[-]
Building a Nix cache server and faced a classic system design dilemma: chunk compressed data (fast/simple) or decompress first (slow/complex)?

I tested 60k+ NAR files to find out.

Compressed: 6.4% dedup hit rate Uncompressed: 47.8% dedup hit rate

Decompression wins, saving 18% in total storage.

(P.S. To handle the pipeline throughput, I also built the fastest FastCDC implementation in Go: https://github.com/kalbasit/fastcdc)

reply