There seems to be an ongoing trend (and my gut feeling) of companies moving from chatbots to AI agents that can actually execute actions—calling APIs, modifying databases, making purchases, etc.
I'm curious: if you're running these in production, how are you handling the security layer beyond prompt injection defenses?
Questions:
- What stops your agent from executing unintended actions (deleting records, unauthorized transactions)?
- Have you actually encountered a situation where an agent went rogue, and you lost money or data?
- Are current tools (IAM policies, approval workflows, monitoring) enough, or is there a gap?
Trying to figure out if this is a real problem worth solving or if existing approaches are working fine.
What helped wasn’t more IAM or approvals everywhere, but tightening execution boundaries. Each step had explicit permissions and a narrow goal. That made failures boring and local instead of surprising and systemic. Existing tools help, but there’s a gap around step-level control and visibility when agents start chaining actions.