We decreased our LLM costs with Opus
36 points
1 hour ago
| 4 comments
| mendral.com
| HN
wxw
59 minutes ago
[-]
> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.

> 4 out of 5 failures never reach Opus. A triager match costs around 25x less than a full investigation.

The title feels misleading. Why clickbait on that when you can just be genuine about the architecture?

reply
idorosen
53 minutes ago
[-]
The title does not match the article title: “We Upgraded to a Frontier Model and Our Costs Went Down”.
reply
stingraycharles
17 minutes ago
[-]
It’s still misleading, though.
reply
cadamsdotcom
47 minutes ago
[-]
I have rewritten the article to be slightly shorter:

“Let a cheap agent decide if the expensive one is needed.”

reply
a_t48
6 minutes ago
[-]
Sounds like L1 vs L2 support :)
reply
saltyoldman
15 minutes ago
[-]
I do a similar thing with a "planner agent" that uses the cheapest (I think it's using openai-gpt-5.2-mini or something at like 20 cents for 1M.) that more or less emits a plan name, task list and the task list has a recommended model in each task. It's not perfect, but many of our tasks are accomplished with lighter weight models. When doing code generation or fixing we upgrade to a more expensive model, planning and decisions are done more cheaply. Keep in mind the tasks are relatively constrained, so planning done with a cheap agent makes sense here. An open-ended agent would likely use a more expensive call for planning.
reply
whalesalad
46 minutes ago
[-]
Looking at the diagram, is this seriously a case of replacing basic functional concepts like "write to clickhouse" or "have we seen this before" to a model? could those be actual function calls in some language?

just seems wasteful all around. having an agent in the critical path when a regular expression (or similar) could do just seems odd. yeah haiku is cheap but re.match() is cheaper.

reply