A single user action can trigger anywhere from a few to dozens of LLM calls (tool use, retries, reasoning steps), and with token-based pricing the cost can vary a lot.
How are builders here planning for this when pricing their SaaS?
Are you just padding margins, limiting usage, or building internal cost tracking? Also curious, would a service that offers predictable pricing for AI APIs (like a fixed subscription cost) actually be useful for people building agentic workflows?
When the model gets prose instructions, it guesses what's a constraint vs. an example vs. an objective. That guess is inconsistent across calls. Each wrong output is another LLM call.
Typed blocks reduce this. Role in one section, constraints in another, output_format explicitly tagged. The model gets unambiguous signal and output variance drops. Fewer retries, more predictable token spend.
I built github.com/Nyrok/flompt around this: decomposes prompts into 12 semantic blocks, compiles to Claude-optimized XML. Doesn't solve full cost forecasting but cuts the "bad output -> retry" part of the variance.
Maybe focus first on providing value and later you can optimize this setup.
and this topic actually inspires me that I can introduce a builtin gas meter for tokens
This takes a couple of hours maximum at best.