Ask HN: How are you using multi-agent AI systems in your daily workflow?
16 points
1 month ago
| 11 comments
| HN
We've been running a 13-agent system (PAI Family) for a few months — specialized agents for research, finance, content, strategy, critique, psychology, and more. They collaborate, argue, and occasionally bet against each other on our prediction market.

Curious what others are building. Are you running multiple AI agents? What architectures work? What fails spectacularly?

raffaeleg
1 month ago
[-]
Curious about the prediction market mechanic, that's the part most people skip. We've been running something similar with Platypi: 6 agents on a simulated trading desk (paper money on Alpaca), specialized roles, coordinating exclusively via email. No dashboard, no human intervention. The coordination patterns that emerged were unexpected. Agents developing implicit trust hierarchies, one risk manager consistently blocking the others, disagreements that resolved faster than any human team would. it's like here: https://platypi.empla.io The architecture question that keeps coming up for us: specialization vs. redundancy. Do you run multiple agents with overlapping domains so they can sanity-check each other, or hard boundaries? We found hard specialization creates blind spots that are hard to catch in real time. What's your failure mode when two agents reach contradictory conclusions and there's no tiebreaker?
reply
jovanaccount
1 month ago
[-]
From my experience building multi-agent systems: the biggest underappreciated problem is state coordination.

Frameworks handle individual agent capabilities well. What they don't handle: preventing two agents from silently overwriting each other's work on shared state. It's a classic race condition but in AI systems the output looks reasonable, so you don't notice it until production.

We open-sourced a coordination layer that adds atomic state management to any framework (LangChain, AutoGen, CrewAI, MCP, etc.): https://github.com/Jovancoding/Network-AI

reply
formreply
1 month ago
[-]
What fails spectacularly in our setup: agents that share a conversation thread and try to resolve conflicts in real time. They race to add the last word, produce verbose non-decisions, and eventually one agent just agrees with whatever was said last. Consensus is a bad protocol for async, unequal agents.

What works: role clarity + veto rights. One agent can only block, never propose. One agent makes calls, others can raise flags. You stop the chatbot parliament problem and actually get decisions.

The other pattern worth stealing from production systems: treat inbound events (emails, webhooks, form submissions) as the task boundary, not the conversation turn. An agent that owns a mailbox and processes messages one at a time is dramatically more auditable than one that's always-on and decides what to react to. You can replay it, diff its outputs, and understand why it did what it did.

reply
raffaeleg
1 month ago
[-]
We're running a live event at platypi.empla.io — a simulated trading desk where 6 agents coordinate entirely via email with no human in the loop. No shared conversation thread, no central orchestrator. Bozen (supervisor) gets a morning briefing from each PM agent, they argue about positions over email, Mizumo executes. The interesting thing isn't the trading — it's that email as coordination protocol produces naturally auditable, replayable agent behavior. Paper money on Alpaca, but the coordination infrastructure is the point.
reply
stokemoney
1 month ago
[-]
Built my own custom solution that is completely spec-driven. Have concepts of specs, plans, and then a kanban board to monitor all agents as it progresses

It takes a plan, breaks it into dependent tasks, has human-in-the-loop for approval, and then is fire-and-forget after the plan is started with parallel agent workers. Has complete code review loops and testing loops for accuracy and quality. Idempotent retries and restarts...Completely frontend-driven so I don't have to deal with dumb terminals like claude code...

reply
mrothroc
1 month ago
[-]
I've been running a multi-agent setup for quite a while to do software development. I set up a workflow with agents at each stage, spec->plan->design->code->review. The key thing I learned was that the arrangement of the checks between agents matters more than which model you pick for any one step. Most failures were omissions that a gate between stages catches.
reply
Horos
1 month ago
[-]
I've set a fully async patern. blobs chunked into sqlite shards.

It's a blind fire n forget go worker danse.

wich can be hold as monitoreed or scale as multiple instances if needed by simple parameters.

Basicaly, It's a job as librairy patern.

If you dont need real time, its bulletproof and very llm friendly.

and a good token saver by the batching abilities.

reply
leandot
1 month ago
[-]
Curious about more details about this setup?
reply
Horos
1 month ago
[-]
The "job as library" pattern is simple: instead of wiring jobs into main or a framework, you split into 3 things.

Your queue is a struct with New(db) — it knows submit, poll, complete, fail, nothing else.

Your worker is another struct that loops on the queue and dispatches to handlers registered via RegisterHandler("type", fn). Your handlers are pure functions (ctx,payload) → (result, error) carried by a dependency struct.

Main just assembles: open DB, create queue, create worker, register handlers, call worker.Start(ctx). Result: each handler is unit-testable without the worker or network, the worker is reusable across any pipeline, and lifecycle is controlled by a simple context.Cancel().

Bonus: here the queue is a SQLite table with atomic poll (BEGIN IMMEDIATE), zero external infra.

The whole "framework" is 500 lines of readable Go, not an opaque DSL. TL;DR: every service is a library with New() + Start(ctx), the binary is just an assembler.

The "all in connectivity" pattern means every capability in your system — embeddings, document extraction, replication, MCP tools — is called through one interface: router.Call(ctx,"service", payload).

The router looks up a SQLite routes table to decide how to fulfill that call: in-memory function (local), HTTP POST (http), QUIC stream (quic), MCP tool (mcp), vector embedding (embed), DB replication (dbsync), or silent no-op (noop).

You code everything as local function calls — monolith. When you need to split a service out, you UPDATE one row in the routes table, the watcher picks it up via PRAGMA data_version, and the next call goes remote.

Zero code change, zero restart. Built-in circuit breaker, retry with backoff, fallback-to-local on remote failure, SSRF guard.

The caller never knows where the work happens.

That's the "job as library" pattern: the boundary between monolith and microservices is a config row, not an architecture decision.

https://github.com/hazyhaar/pkg/tree/main/connectivity

reply
Horos
1 month ago
[-]
had a look ?
reply
dhruvkar
1 month ago
[-]
Following.

I'm using Openclaw + Opus. Several subagents.

However, performance is degraded when using subagents - scraping is less smart, content is worse written, etc.

I'm curious about using different instances instead, but not sure how to use a shared memory foundation effectively.

reply
humbleharbinger
1 month ago
[-]
We built a messaging platform for exactly this use case and instruct claws to check in with each other or share context with each other at regular intervals.

Check out htpps://agentbus.org

reply
xpnsec
1 month ago
[-]
More interestingly, what frameworks/harnesses/architecture are people using to drive multi-agent workflows?
reply
Nancy0904
1 month ago
[-]
It sounds complicated. Is your Agent trying to solve everything?
reply
Irving-AI
1 month ago
[-]
How well is your agent performing?
reply