Show HN: Prompt-injection firewall for OpenClaw agents
3 points
1 hour ago
| 0 comments
| github.com
| HN
People seem to be blindly hooking up their OpenClaw’s to their personal data. So, I built runtime controls to prevent at the least, very simple prompt injection attacks.

Once installed, it hooks to Node.js child_process module in the gateway process and listens to tool calls and their response streams. And a fetch hook to monitor user prompts (both could’ve been through fetch, happy to discuss why this whole layer couldn’t just be a proxy).

There are two layers of protection:

First: Whenever there is a read-only tool call whose response an attacker can modify, we extract that part of the json response and send it to a small haiku model to check if it has instruction asking the LLM to do something different

Second: For when the prompt injection detection fails, we maintain a list of function calls which can write to places that an external actor can access. We prompt the user for explicit permission to go forward through the UI.

I would love a discussion on how this second layer could be made better and less frequent by relying on some decision process. My current idea: Based on a collected set of “trusted” context (user prompts, responses from tool calls attackers cannot manipulate), can we detect if this tool call was necessary. There are scenarios where you’d need detection at the parameter-level.

Two notes:

1) This cannot just be a proxy because you need application level integration to have humans in the loop when needed and push UI controls.

2) How i improved accuracy of detecting prompt injection is by selecting only that content from the entire response json that can be manipulated by an external actor. This had to be done for each tool separately. The current implementation is for 2 skills I randomly chose (Notion & Github).

P.S.: I maintain one for claude code myself while working: https://github.com/ContextFort-AI/Runtime-Controls, I created this over the weekend OpenClaw

No one has commented on this post.