Show HN: I ran every Claude agent turn through the Batch API
3 points
1 hour ago
| 0 comments
| eran.sandler.co.il
| HN
I built a tiny Python REPL to answer a dumb-but-useful question:

What happens if every turn in an agent loop goes through Anthropic’s Batch API instead of the normal synchronous endpoint?

The motivation was cost. Batch API is 50% off, which sounds very attractive for agent workloads: evals, background research agents, CI agents, unattended subagents, etc.

The result: it works, but it is awful for a single interactive agent.

In my runs, a one-entry batch usually took ~90–120 seconds to complete. That means a five-turn tool loop becomes a ten-minute interaction. Waiting two minutes for the model to decide it needs to run ls is not a good UX.

But that was also the point of the experiment. A single REPL turn is probably the wrong unit to batch.

The interesting version is fleet-level batching:

- many agents running in parallel - background subagents - CI/eval jobs - multiple harnesses sharing a local proxy - shared prompt prefixes that may benefit from caching

In that world, the batcher should probably sit below the harness as infrastructure. Existing tools keep using the normal API shape, while a proxy decides per request whether it should go sync or async based on latency tolerance.

One surprising observation: in my small, non-rigorous testing, Haiku batches often felt slower than Sonnet/Opus batches. I wouldn’t treat that as a benchmark, but it does suggest routers should measure this rather than assuming “cheap model = batch model.”

Repo is here:

https://github.com/erans/batching-harness

It is intentionally small: one Python file, a basic tool loop, local shell tool, stats panel, and minimal sandboxing.

The useful lesson for me was:

Batch API is terrible as an interaction pattern for one agent. It might be very useful as a hidden optimization layer for a fleet of agents.

No one has commented on this post.