FilterHN

I broke into my own AI system in 10 minutes. I built it

2 points

by mohith_km

1 hour ago

| past

| 0 comments

| HN

Last week I finished building a small AI workflow. Four agents working together, connected to a real database.

I got curious and asked myself — what if someone sent something malicious?

So I tried it on myself.

I typed a manipulative goal instead of a normal one. The system processed it, stored it in my database, and told me everything completed successfully.

Tried it five more times with different approaches. Same result every time. Six attempts. Six successes. My own database now has six attack records sitting in it from my own tests.

Nobody in my system noticed. No alert. No refusal. No warning. The thing that got me — this isn't a bug. The system worked exactly as designed. It just wasn't designed with this in mind. And from what I can tell, most AI agent systems aren't.

Is anyone actually thinking about this in production?

No one has commented on this post.