FilterHN

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

5 points

10 hours ago

| 2 comments

I built a local-first UI that adds two reasoning architectures on top of small models like Qwen, Llama and Mistral: a sequential Thinking Pipeline (Plan → Execute → Critique) and a parallel Agent Council where multiple expert models debate in parallel and a Judge synthesizes the best answer. No API keys, zero .env setup — just pip install multimind. Benchmark on GSM8K shows measurable accuracy gains vs. single-model inference.

▲

selfradiance

10 hours ago

[-]

The Agent Council approach is interesting — having multiple small models debate in parallel and a judge synthesize feels like a more principled version of what people do manually when they cross-check answers between Claude, GPT, and Gemini. Curious whether the GSM8K gains hold up on less structured tasks where there isn't a single correct answer (e.g. summarization or open-ended reasoning).

▲

BloodAndCode

7 hours ago

[-]

this is a really interesting direction. i've been experimenting with “self-critique” style pipelines (plan → solve → critique) and they often help smaller models punch above their weight. the agent council idea is also appealing, although the cost/latency trade-off usually becomes the tricky part when multiple models run in parallel.

curious how often the judge actually disagrees with the first candidate answer in practice. does the council mostly refine reasoning, or does it sometimes lead to completely different conclusions?