We built the fastest data replication tool in the world using Go
2 points
7 hours ago
| 0 comments
| HN
hey people! At OLake, our team has been building a high-throughput data replication tool in Go for a while now. the more we push real workloads, the more it is getting clear that Go is a fantastic fit for data engineering simple concurrency, predictable deploys, tiny containers, and great perf without a JVM.

As part of that journey, we’ve been contributing upstream to the Apache Iceberg Go ecosystem. this week, our PR to enable writing into partitioned tables got merged (https://github.com/apache/iceberg-go/pull/524)

However that may sound niche, but it unlocks a very practical path for Go services to write straight to Iceberg (no Spark/Flink detour) and be query-ready in Trino/Spark/DuckDB right away.

what we added : partitioned fan-out writer that splits data into multiple partitions, with each partition having its own rolling data writer efficient Parquet flush/roll as the target file size is reached, all the usual Iceberg transforms supported: identity, bucket, truncate, year/month/day/hour Arrow-based write for stable memory & fast columnar handling

and why we’re bullish on Go for building our platform - OLake?

the runtime’s concurrency model makes it straightforward to coordinate partition writers, batching, and backpressure. small static binaries → easy to ship edge and sidecar ingestors. great ops story (observability, profiling, and sane resource usage) which is a big deal when you’re replicating at high rates. where this helps right now: building micro-ingestors that stream changes from DBs to Iceberg in Go. edge or on-prem capture where you don’t want a big JVM stack. teams that want cleaner tables (fewer tiny files) without a separate compaction job for every write path.

For data teams still worried about Go, we have our case study helps you : check the benchmarks we’re hitting thanks to the language’s lightweight model See numbers here: https://olake.io/docs/benchmarks

If you’re experimenting with Go + Iceberg, we’d love to collaborate as we believe in open source :)

repo: https://github.com/datazip-inc/olake/

No one has commented on this post.