Show HN: Bucket Delta – Compute differences between two S3-compatible buckets
1 points
1 hour ago
| 0 comments
| github.com
| HN
We built this at Nutanix to solve a recurring problem: detecting data drift between large object stores (hundreds of millions of objects).

Existing tools like AWS CLI (aws s3 sync --dryrun) and s5cmd (--dry-run) are great for many workflows, but we had slightly different requirements for deeper and more flexible comparisons. That led us to build this tool, and we’ve now open-sourced it for broader use.

Key Features:

1. Bidirectional diff: Given two buckets A and B, the tool reports both (A−B) and (B−A).

2. Shallow Check Mode: Compares object name + ETag to determine presence.

3. Deep check mode: Compares tags, metadata, and ObjectLock/WORM via HeadObject.

4. Resumability: Checkpointing allows long-running jobs to resume seamlessly. For example, a 10M-object run interrupted at 8M continues from where it left off.

Performance (500K objects): Shallow mode: ~3,900 objects/sec Deep mode: ~1,345 objects/sec (25 worker processes)

The tool works across any two S3-compatible buckets.

Happy to discuss any query regarding the tool!

No one has commented on this post.