LLM Agents can Autonomously Exploit One-day Vulnerabilities demonstrated that frontier models can exploit known vulnerabilities when given appropriate tooling. And if you have used Claude Code, there is no doubt you’ve either used it or have seen how well it can reverse engineer.
However, Benchmarking Practices in LLM-driven Offensive Security surveyed multiple papers in this space and found that only around 25% evaluated local or small models. The majority relied on GPT-4 or similar cloud-hosted frontier models, often with CTF-style challenges where hints were embedded in the prompt.
In this work, I defined a set of simple challenges to give a locally hosted model a single HTTP request tool that pointed to Juice Shop. The amount of guidance varies by challenge, and some provide only an endpoint and a goal. Whereas others include step-by-step instructions, but in all cases, the model must craft and execute the actual payloads. As it goes on, there are caveats that are added and anecdotal notes. — Read More