The problem is that regex is a cat-and-mouse game. It misses "disregard prior directives" while looking for "ignore instructions." It fails entirely on multi-language exploits. Once an Agent has tool access (shell, DB), a single missed semantic variation becomes an RCE.
So I built Prompt Inspector. It is a semantic detection engine designed to move beyond blacklists.
The core deal:
Vector-based detection: Instead of keywords, we use embeddings to map prompts. It catches the intent of an injection, even if the phrasing is unique or translated.
Self-evolving loop: Borderline cases trigger an async LLM review. If it is a new attack pattern, the system automatically extracts the embedding and updates the vector database. It learns from new exploits.
Decoupled by design: It returns a confidence score rather than a hard block. The developer keeps full control over the execution routing.
Pluggable: Started with Google’s latest embeddings, but the architecture allows for custom-deployed models to avoid vendor lock-in.
Tech-stack: FastAPI, Vector Database, Google Embedding models, and an LLM-in-the-loop reviewer.
I’m currently offering free credits for early testers and open-source projects. I’d love to hear how you guys are handling tool-calling security beyond basic prompt engineering.
Live at: https://promptinspector.io