FilterHN

Ask HN: How are you controlling AI agents that take real actions?

2 points

by thesvp

4 hours ago

| past

| 4 comments

| HN

We're building AI agents that take real actions — refunds, database writes, API calls.

Prompt instructions like "never do X" don't hold up. LLMs ignore them when context is long or users push hard.

Curious how others are handling this: - Hard-coded checks before every action? - Some middleware layer? - Just hoping for the best?

We built a control layer for this — different methods for structured data, unstructured outputs, and guardrails (https://limits.dev). Genuinely want to learn how others approach it.

▲

vincentvandeth

1 hour ago

[-]

Hard-coded checks before every action, plus a governance layer that separates "what the agent wants to do" from "what it's allowed to do." The deeper issue: if your agent decides whether to issue a refund, you're solving the wrong problem with prompt guards. A refund is a deterministic business rule — order exists, within return window, amount matches. That decision shouldn't be made by an LLM at all.

In my setup, agents propose actions and write structured reports. A deterministic quality advisory then runs — no LLM involved — producing a verdict (approve, hold, redispatch) based on pre-registered rules and open items. The agent can hallucinate all it wants inside its context window, but the only way its work reaches production is through a receipt that links output to a specific git commit, with a quality gate in between.

For anything with real consequences (database writes, API calls, refunds), the pattern is: LLM proposes → deterministic validator checks → human approves. The LLM never has direct write access to anything that matters.

"Just hoping for the best" works until it doesn't. We tracked every agent decision in an append-only ledger — after a few hundred entries, you start seeing exactly where and how agents fail. That pattern data is more useful than any prompt guard.

▲

thesvp

20 minutes ago

[-]

The separation between 'what the agent wants to do' and 'what it's allowed to do' is the right mental model.

The append-only ledger point is underrated too — pattern data from real failures is worth more than any upfront rule design.

How long did it take to build and maintain that governance layer? And as your agent evolves, do the rules keep up or is that becoming its own maintenance burden?

▲

apothegm

35 minutes ago

[-]

Just treat the LLM as an NLP interface for data input. Still run the inputs against a deterministic heuristic for whether the action is permitted (or depending on the context, even for determining what action is appropriate).

LLMs ignore instructions. They do not have judgement, just the ability to predict the most likely next token (with some chance of selecting one other than the absolutely most likely). There’s no way around that. If you need actual judgement calls, you need actual humans.

▲

thesvp

21 minutes ago

[-]

Exactly right - the deterministiclayer is the only thing you can actually trust.

We landed on the same pattern: LLM handles the understanding, hard rules handle the permission. The tricky part is maintaining those rules as the agent evolves. How are you managing rule updates code changes every time or something more dynamic?

▲

chrisjj

3 hours ago

[-]

> Prompt instructions like "never do X" don't hold up. LLMs ignore them when context is long or users push hard.

Serious question. Assuming you knew this, why did you choose to use LLMz for this job?

▲

thesvp

19 minutes ago

[-]

Fair. We didn't choose LLMs to enforce rules — we chose them to understand intent. The enforcement happens outside the LLM entirely. That's the separation that actually holds up in production

▲

adamgold7

2 hours ago

[-]

Prompt guardrails are theater - they work until they don't. We ended up building sandboxed execution for each agent action. Agent proposes what it wants to do, but execution happens in an isolated microVM with explicit capability boundaries. Database writes require a separate approval step architecturally separate from the LLM context.

Worth looking at islo.dev if you want the sandboxing piece without building it yourself.

▲

thesvp

18 minutes ago

[-]

Sandboxed execution is solid for isolation — separating proposal from execution is the right architecture. The piece we kept hitting was the policy layer on top: who defines what the agent is allowed to propose in the first place, and how do you update those rules without a redeploy every time?