We’re three security researchers in Tokyo building an autonomous agent framework for authorized security testing (VDP/Bug Bounty).
We wanted to share our experimental results running this agent against live targets (as of Feb 8):
Real-World Impact: Reached #86 globally on the HackerOne VDP leaderboard (90 days).
Gov Targets: 3 vulnerabilities triaged by the U.S. Department of Defense (DoD).
Benchmark: Solved 84% of PortSwigger Web Security Academy labs autonomously.
Interestingly, we encountered an "Impact Gap": while the agent finds technically valid exploits, it often struggles to assess business criticality, leading to "Informative" closures.
We released our architecture design and safety proxy details on GitHub. We'd love to hear your thoughts on bridging this gap between technical exploitability and business impact.
URL: https://github.com/cyberprobe-ai/autonomous-pentest-agent-research