FilterHN

Ask HN: How do you prevent AI agents from going rogue in production?

3 points

17 hours ago

| 1 comment

Hi all!

There seems to be an ongoing trend (and my gut feeling) of companies moving from chatbots to AI agents that can actually execute actions—calling APIs, modifying databases, making purchases, etc.

I'm curious: if you're running these in production, how are you handling the security layer beyond prompt injection defenses?

Questions:

- What stops your agent from executing unintended actions (deleting records, unauthorized transactions)?

- Have you actually encountered a situation where an agent went rogue, and you lost money or data?

- Are current tools (IAM policies, approval workflows, monitoring) enough, or is there a gap?

Trying to figure out if this is a real problem worth solving or if existing approaches are working fine.

▲

Agent_Builder

54 minutes ago

[-]

We ran into this while building GTWY.ai. What surprised us was that most “rogue” behavior didn’t come from prompt injection, but from agents having too much implicit authority across steps. Nothing catastrophic, but plenty of near-misses where an agent did the right thing in the wrong context.

What helped wasn’t more IAM or approvals everywhere, but tightening execution boundaries. Each step had explicit permissions and a narrow goal. That made failures boring and local instead of surprising and systemic. Existing tools help, but there’s a gap around step-level control and visibility when agents start chaining actions.