FilterHN

Show HN: Batty – Run a team of AI coding agents in tmux with test gating

1 points

by Zedmor

2 hours ago

| past

| 0 comments

| github.com

| HN

Hi HN, I'm the author.

I use Claude Code and Codex daily. Running one agent on a task works great. Running three or four in parallel on the same repo? They step on each other's files, nobody checks if the code compiles, and you spend more time coordinating than coding.

Batty is the supervisor layer I built to fix this. You define a team in YAML — an architect that plans work, a manager that dispatches it, engineers that execute. Batty launches each role in its own tmux pane, isolates engineer work in git worktrees, routes messages between roles, and gates task completion on passing tests.

The interesting part is what it's not: it's not an agent framework, and it doesn't embed any model. It orchestrates existing agent CLIs (Claude Code, Codex, Aider) using tmux as the runtime and git worktrees for isolation. Config is YAML, the kanban board is Markdown (powered by a bundled kanban-md tool), inboxes are Maildir, logs are JSONL. You can `git diff` your entire team state.

Built in Rust, published on crates.io (v0.1.0). The daemon is a synchronous 5-second poll loop — no async complexity. It watches pane output to detect idle/active/dead agents, reads Claude and Codex session files on disk to reduce false-positive idle detection, and uses a merge lock to serialize concurrent worktree merges.

Some things I learned running multi-agent setups:

- 3-5 parallel engineers is the sweet spot. Beyond that, the codebase itself becomes the bottleneck for absorbing parallel changes. - Task decomposition quality matters more than agent count. A good architect prompt outperforms throwing more engineers at bad tasks. - Test gating eliminated most of the chaos. Without it, agents "complete" work that breaks everything downstream. - You still need to supervise. It's not fire-and-forget — it's closer to managing a junior team. The leverage is supervising five workstreams instead of doing one.

I know there's prior art in this space — Tmux-IDE and vibe-kanban both approach multi-agent coordination differently. Batty is more opinionated about supervision: the test gating and communication constraints are first-class, not optional. Different tradeoffs for different workflows.

It's early (v0.1.0). The core loop is solid but the API is still settling. Eight built-in templates range from solo (1 agent) to large (19 agents with three management layers). The architecture diagram in the README shows the full supervision flow.

2-minute demo: https://youtube.com/watch?v=2wmBcUnq0vw Docs: https://battysh.github.io/batty

Happy to go deep on the architecture or the worktree strategy. For those running multiple agents: what's the biggest operational pain point?

No one has commented on this post.