Most tools I’ve seen focus on observability (logs, traces, dashboards), but not actual enforcement at runtime.
Curious how people here are handling this in production:
- Are you enforcing hard limits (budget, rate, etc.) or just monitoring?
- Do you handle this at the app level or via some middleware/proxy?
- Have you built something in-house for this?
Feels like an unsolved problem, especially with agents.
Would love to hear how others are dealing with it.
With this and keeping the interact, we cut token usage by ~75% across the app. On the output side, the LLM only produces changes needed, not the full diagram. Layout, validation, and rendering are computed client-side for free so costs only scale with what the user asks for. With good UX as well, we can pay attention to what users ask for, and create “quick actions” that use the LLM within closed loop subsystems. Since we assign a credit system for AI tool usage, we’re better able to accurately assign credit costs to quick actions because each action has a defined scope.
TLDR: make the LLM do less, then put hard limits around the smaller set of things it’s allowed to do