What I'm Finding About LLM Code Style and Token Costs
14 points
2 hours ago
| 3 comments
| jimmont.com
| HN
jimmont
2 hours ago
[-]
Reviewing my experience using LLMs, to improve results, reduce churn and token usage. Discovering the gap between what they produce and what I'd normally do is a significant source of output cost, regressions and surfacing a bit of why and how to fix it. Notably Claude is remarkably bad at/about this, producing errors even when directed toward modern Web solutions—that cut token use a lot, like toward 90% occasionally, which together with the frustrating churn led me to review how I'm working, what is happening and generate this article.
reply
ftaisdeal
2 hours ago
[-]
Excellent article, with impeccable analysis, that will fundamentally change how I work with Claude myself. I have already learned to give Claude both a "do" and a "don't" in order to limit unpleasant surprises.
reply
defytonofficial
1 hour ago
[-]
This matches my experience. I've been using OpenRouter with GPT-4o for an image verification service, and the prompt engineering choices have a measurable impact on cost.

One thing I found: asking the model to respond in structured JSON (with a strict schema) vs free-form text cuts token output by ~40% on average. The model stops "explaining itself" and just gives you the answer.

Also noticed that including a reference image in vision calls roughly doubles the input cost but improves accuracy enough that you save on retries. Net cost ended up lower for my use case.

Curious if you've measured the difference between asking for "concise" output vs actually constraining the response format.

reply