FilterHN

Why I'm moving away from Regex for LLM Agent security

3 points

12 hours ago

| 0 comments

I’ve been auditing how open-source execution engines handle prompt injection. Most of them (like OpenClaw) rely on a 3-layer static defense: regex blacklists, XML tagging, and character sanitization.

The problem is that regex is a cat-and-mouse game. It misses "disregard prior directives" while looking for "ignore instructions." It fails entirely on multi-language exploits. Once an Agent has tool access (shell, DB), a single missed semantic variation becomes an RCE.

So I built Prompt Inspector. It is a semantic detection engine designed to move beyond blacklists.

The core deal:

Vector-based detection: Instead of keywords, we use embeddings to map prompts. It catches the intent of an injection, even if the phrasing is unique or translated.

Self-evolving loop: Borderline cases trigger an async LLM review. If it is a new attack pattern, the system automatically extracts the embedding and updates the vector database. It learns from new exploits.

Decoupled by design: It returns a confidence score rather than a hard block. The developer keeps full control over the execution routing.

Pluggable: Started with Google’s latest embeddings, but the architecture allows for custom-deployed models to avoid vendor lock-in.

Tech-stack: FastAPI, Vector Database, Google Embedding models, and an LLM-in-the-loop reviewer.

I’m currently offering free credits for early testers and open-source projects. I’d love to hear how you guys are handling tool-calling security beyond basic prompt engineering.

Live at: https://promptinspector.io

No one has commented on this post.