FilterHN

Show HN: Agent Hypervisor – Reality Virtualization for AI Agents

1 points

1 hour ago

| 0 comments

Author here. Built this after working on AI agent security at Radware, where we discovered ZombieAgent - persistent malicious instructions in agent memory.

The insight: Don't teach agents to resist attacks. Virtualize their perceived reality so attacks never enter their world. Like VMs hiding physical RAM → agents shouldn't see raw dangerous inputs.

ARCHITECTURE: - Input virtualization: Strip attacks at boundary (not after agent sees them) - Provenance tracking: Prevents contaminated learning (critical with continuous learning coming in 1-2 years per Amodei) - Taint propagation: Deterministic physics laws prevent data exfiltration - No LLM in critical path: Fully deterministic, testable

Working PoC demonstrates: - Prompt injection prevention (attacks stripped at virtualization boundary) - Taint containment (untrusted data can't escape system) - Deterministic decisions (same input = same output, always)

CRITICAL TIMING: Dario Amodei (Anthropic CEO, Feb 13): Continuous learning in 1-2 years [1] Problem: Memory poisoning + continuous learning = permanent compromise Solution: Provenance tracking prevents untrusted data from entering learning loop

Research context: - OpenAI: "unlikely to ever be fully solved" [2] - Anthropic: 1% ASR = "meaningful risk" - Academic research: 90-100% bypass rates on published defenses [3]

Seeking feedback on whether ontological security (does X exist?) beats permission security (can agent do X?) for agent systems.

Practical workarounds available in repo for immediate use while PoC matures.

Disclaimer: Personal project, not Radware-endorsed. References to published work only.

Happy to answer questions!

[1] https://www.dwarkesh.com/p/dario-amodei-2 [2] https://simonwillison.net/2024/Dec/9/openai-prompt-injection... [3] https://arxiv.org/abs/2310.12815

No one has commented on this post.