Show HN: I am running 3 coding agents non-stop over the last 3 days. Here is how
3 points
1 hour ago
| 1 comment
| HN
1. Headless mode

Headless mode allows you to use the AI as a command-line utility for automation and scripting. In Claude Code you run it with the -p flag: claude -p, in codex - exec, opencode - run.

2. Ask human

The traditional communication channel with the operator won't work in headless mode - we need to implement a dedicated tool. Here is an example of how this can be done https://github.com/sermakarevich/claude/tree/main/mcp/ask_hu...

3. Tasks queue

Beads is a lightweight distributed graph issue tracker for AI agents, powered by Dolt. You can create tasks, define dependencies between tasks, and have status, priorities, hierarchy. Beads helps prevent multiple tasks from being claimed by > 1 worker.

4. Worker artifacts

We want to be able to monitor how a worker is doing, at what stage it is, and resume it after a restart. For every task we can create a dedicated folder using the beads task id and put into it what we need. I put there: - plan and status md - knowledge md - events.jsonl - stderr

The worker is instructed in its prompt to check if artifacts exist, which allows it to proceed from where the job was left.

5. Worker isolation

To prepare to run multiple workers we need to isolate them. Git worktree can be used here. I am testing this approach: - worker gets the task and implements it - the next worker, spawned automatically, validates the task is done, tests it, merges the worktree, closes the ticket and creates another one for a fix if required

6. Multiple workers

To be able to run multiple workers we need a simple orchestrator. An infinite loop constantly checking beads / config and triggering new workers when required.

7. Coder agnostic

A worker can be basically any coder. I started with Claude, added Codex and Agy. And last added Opencode.

8. Subscription limits.

3 coding agents can burn the Claude $200 subscription limit in 30 minutes even if you switch to Sonnet 4.6. API tokens cost x40 compared to tokens in the subscription - this is too expensive. The idea I am testing is: - use the strongest model possible to analyse/design and add tasks - use a local model as a worker - use a stronger model to validate workers and add new tasks to fix potential misimplementations

I am using the qwen3.6:36B local model with Ollama, deployed on 2 GPU cards, 36GB in total, with a 256K context window. This is slower, but it is free of charge. And surprisingly it worked, and worked way better than I would expect it to. Fable 5 was extremely great at creating clear and simple tickets until it was.

Another approach I was considering is Bedrock qwen, paying per token, or renting a 96GB GPU for $1400 per month.

I found that it's optimal to run 3 workers concurrently even though Ollama processes 1 request at a time. The reason is the ask_human tool. If a worker asks me something at night - it has to wait until morning doing nothing. Running three +/- guarantees GPU load at 100%.

9. Nice integrations

UI - to observe tasks / beads / config / chat / analytics

It's easy to miss when a model asks a question. It's visible in the UI - a green circle near chat, but that's it. So I added a Telegram integration - now I receive questions from workers on Telegram and can reply there, get the status of tasks, create new tasks etc.

I am doing this for my PoC projects ofc: - improving fleet - building a data collection and analysis related app

What I am seeing is that 24x7 coders are closer than I thought they are. Even weaker models can deliver good results when the task is simple and well defined. All components for building these systems are there.

Repo: https://github.com/sermakarevich/fleet

mikgp
1 hour ago
[-]
What are you producing with all this power?
reply