FilterHN

Show HN: I solved Claude Code's prompt injection problem, saved tokens doing it

1 points

1 hour ago

| 1 comment

I built a drop-in MCP server that sanitizes web content before it reaches your LLM — stripping prompt injection vectors deterministically, no LLM call needed. Along the way I found it also cuts token usage by ~90%.

Hidden HTML elements, zero-width characters, base64 payloads, fake LLM delimiters (<|im_start|>, [INST], <<SYS>>) — WebFetch passes all of it straight through. mcp-safe-fetch strips it in 8 stages on raw HTML and the resulting markdown.

Tested against PayloadsAllTheThings: caught 3 hidden elements and 4 LLM delimiter patterns WebFetch missed. Side effect I didn't expect — ~90% average token reduction across 4 test sites. Live test: same article, same task, 24,700 tokens vs 575.

Doesn't catch semantic injection (malicious instructions in visible text). That requires model judgment.

npx -y mcp-safe-fetch init — sets up Claude Code in one command. Works with any MCP client.

▲

antimaterial

11 minutes ago

[-]

Nice work on this. The token reduction side effect alone makes it worth dropping in.

I'm sure you are already thinking about other attack vectors, web fetch is one way injection gets in but agents have a lot more surfaces. User input, tool responses, memory, other agents in a chain.

I've been poking at handling this sanitization at the api call level and filtering everything. Definitely more latency w this approach, but essentially denying all.