Ask HN: How are people forecasting AI API costs for agent workflows?
5 points
18 hours ago
| 7 comments
| HN
I’ve been experimenting with agent-based features and one thing that surprised me is how hard it is to estimate API costs.

A single user action can trigger anywhere from a few to dozens of LLM calls (tool use, retries, reasoning steps), and with token-based pricing the cost can vary a lot.

How are builders here planning for this when pricing their SaaS?

Are you just padding margins, limiting usage, or building internal cost tracking? Also curious, would a service that offers predictable pricing for AI APIs (like a fixed subscription cost) actually be useful for people building agentic workflows?

gilles_oponono
1 hour ago
[-]
I love the idea. @Edgee.ai we are tracking cost in real time by tag, LLM, ...but not yet on forecast, and indeed it will be very useful. Something to explore; thanks for the feedback.
reply
hkonte
9 hours ago
[-]
One underlooked source of variance: unstructured system prompts causing output format failures, which trigger retries or correction loops.

When the model gets prose instructions, it guesses what's a constraint vs. an example vs. an objective. That guess is inconsistent across calls. Each wrong output is another LLM call.

Typed blocks reduce this. Role in one section, constraints in another, output_format explicitly tagged. The model gets unambiguous signal and output variance drops. Fewer retries, more predictable token spend.

I built github.com/Nyrok/flompt around this: decomposes prompts into 12 semantic blocks, compiles to Claude-optimized XML. Doesn't solve full cost forecasting but cuts the "bad output -> retry" part of the variance.

reply
sriramgonella
17 hours ago
[-]
local models are better in controlling costs rather commercial models are very high and no control on this cost..how ever again local models training setup to be archietected very well to train this continoulsly
reply
thiago_fm
14 hours ago
[-]
That isn't true, if you run local models you'll also need to have to spend on operations.

Maybe focus first on providing value and later you can optimize this setup.

reply
gabdiax
11 hours ago
[-]
It feels like the traditional fixed SaaS pricing model is slowly shifting toward more consumption-based pricing.
reply
Lazy_Player82
18 hours ago
[-]
Honestly, if you're designing your agent workflows properly with hard limits on retries and tool calls, the variance shouldn't be that wild. Most of the unpredictability comes from not having those guardrails in place early on. A few weeks of real production data usually shows the average cost is more stable than you'd expect.
reply
Barathkanna
18 hours ago
[-]
True, but for early stage builders it’s harder to design those guardrails upfront. A lot of the time you only discover the retry patterns and cost spikes once real users start hitting the system.
reply
Lazy_Player82
17 hours ago
[-]
Fair point. And honestly, with more non-technical builders shipping agent-based products these days, that's probably where a service like this makes the most sense – for people who don't yet have the experience to know what guardrails to put in place.
reply
Barathkanna
17 hours ago
[-]
Exactly. That’s actually why we started building Oxlo.ai. Early stage builders usually just want to experiment without worrying too much about token cost spikes.
reply
clearloop
18 hours ago
[-]
imo switch to local models could be an option
reply
Barathkanna
18 hours ago
[-]
Local models solve the marginal cost problem, but they move the complexity into infrastructure and throughput planning instead.
reply
clearloop
15 hours ago
[-]
makes sense, it really depends on the use cases, I'm building my version of claw openwalrus for the local LLMs first goal, I think myself will use local models for daily tasks that heavily depend on tool callings, but for coding or doing research, I'll keep using remote models

and this topic actually inspires me that I can introduce a builtin gas meter for tokens

reply
thiago_fm
14 hours ago
[-]
Just add very hard high limits and add instrumentation so you can track it and re-evaluate it accordingly.

This takes a couple of hours maximum at best.

reply