On June 6, President Trump signed an executive order to “reprioritize cybersecurity efforts to protect America,” outlining a rough agenda “to improve the security and resilience of the nation’s information systems and networks.” As the administration develops a new cybersecurity strategy, it is essential that it understand and respond to a shifting trend in cyberspace: After a decades-long slump, defenders may finally be gaining the advantage.
In the 1970s, computers could be kept secure simply by being in locked rooms. But when these computers were connected to networks, attackers gained the advantage. Despite decades of defensive innovations since then, defenders’ efforts are routinely overwhelmed by the gains made by attackers. Successful defense is possible—but only with substantial resources and discipline.
Shifting “the advantage to its defenders and perpetually frustrating the forces that would threaten” cyberspace was a central goal of the Biden administration’s U.S. National Cybersecurity Strategy. But how will defenders—flooded with ambiguous statistics—know if they’re succeeding? — Read More
Tag Archives: Cyber
I Built an AI Hacker. It Failed Spectacularly
What happens when you give an LLM root access, infinite patience, and every hacking tool imaginable? Spoiler: It’s not what you’d expect.
It started out of pure curiosity. I’d been exploring LLMs and agentic AI, fascinated by their potential to reason, adapt, and automate complex tasks. I began to wonder: What if we could automate offensive security the same way we’ve automated customer support, coding, or writing emails?
That idea — ambitious in its simplicity — kept me up for weeks. So naturally, I did what any reasonable builder would do. I spent a couple of days building an autonomous AI pentester that could, in theory, outwork any human red teamer.
Spoiler alert: It didn’t work. But the journey taught me more about AI limitations, offensive security, and the irreplaceable human element in hacking than any textbook ever could. — Read More
The “Bubble” of Risk: Improving Assessments for Offensive Cybersecurity Agents
Most frontier models today undergo some form of safety testing, including whether they can help adversaries launch costly cyberattacks. But many of these assessments overlook a critical factor: adversaries can adapt and modify models in ways that expand the risk far beyond the perceived safety profile that static evaluations capture. At Princeton’s POLARIS Lab, we’ve previously studied how easily open-source or fine-tunable models can be manipulated to bypass safeguards. See, e.g., Wei et al. (2024), Qi et al. (2024), Qi et al. (2025), He et al. (2024). This flexibility means that model safety isn’t fixed: there is a “bubble” of risk defined by the degrees of freedom an adversary has to improve an agent. If a model provider offers fine-tuning APIs or allows repeated queries, it dramatically increases the attack surface. This is especially true when evaluating AI systems for risks related to their use in offensive cybersecurity attacks. In our recent research, Dynamic Risk Assessments for Offensive Cybersecurity Agents, we show that the risk “bubble” is larger, cheaper, and more dynamic than many expect. For instance, using only 8 H100 GPU-hours of compute—about $36—an adversary could improve an agent’s success rate on InterCode-CTF by over 40% using relatively simple methods. — Read More
XBOW’s AI-Powered Pentester Grabs Top Rank on HackerOne, Raises $75M to Grow Platform
We’re living in a new world now — one where it’s an AI-powered penetration tester that “now tops an eminent US security industry leaderboard that ranks red teamers based on reputation.” CSO Online reports:
On HackerOne, which connects organizations with ethical hackers to participate in their bug bounty programs, “Xbow” scored notably higher than 99 other hackers in identifying and reporting enterprise software vulnerabilities. It’s a first in bug bounty history, according to the company that operates the eponymous bot…
Xbow is a fully autonomous AI-driven penetration tester (pentester) that requires no human input, but, its creators said, “operates much like a human pentester” that can scale rapidly and complete comprehensive penetration tests in just a few hours. According to its website, it passes 75% of web security benchmarks, accurately finding and exploiting vulnerabilities. — Read More
Lean and Mean: How We Fine-Tuned a Small Language Model for Secret Detection in Code
We fine-tuned a small language model (Llama 3.2 1B) for detecting secrets in code, achieving 86% precision and 82% recall—significantly outperforming traditional regex-based methods. Our approach addresses the limitations of both regex patterns (limited context understanding) and large language models (high computational costs and privacy concerns) by creating a lean, efficient model that can run on standard CPU hardware. This blog post details our journey from data preparation to model training and deployment, demonstrating how Small Language Models can solve specific cybersecurity challenges without the overhead of massive LLMs.
This research is now one of Wiz’s core Secret Security efforts, adding fast, accurate secret detection as part of our solution. — Read More
Using AI to identify cybercrime masterminds
Online criminal forums, both on the public internet and on the “dark web” of Tor .onion sites, are a rich resource for threat intelligence researchers. The Sophos Counter Threat Unit (CTU) have a team of darkweb researchers collecting intelligence and interacting with darkweb forums, but combing through these posts is a time-consuming and resource-intensive task, and it’s always possible that things are missed.
As we strive to make better use of AI and data analysis, Sophos AI researcher Francois Labreche, working with Estelle Ruellan of Flare and the Université de Montréal and Masarah Paquet-Clouston of the Université de Montréal, set out to see if they could approach the problem of identifying key actors on the dark web in a more automated way. Their work, originally presented at the 2024 APWG Symposium on Electronic Crime Research, has recently been published as a paper. — Read More
Leaking Secrets in the Age of AI
In a rush to adopt and experiment with AI, developers and other technology practitioners are willing to cut corners. This is evident from multiple recent security incidents, such as:
- Platform resource abuses (attackers hijack cloud infrastructure to power their own LLM applications)
- Vendors offering unsafe 3rd-party model execution (Probllama)
- Model escape vulnerabilities in hosting services (Replicate, HuggingFace and SAP-AI vulnerabilities)
Yet another side-effect of these hasty practices is the leakage of AI-related secrets in public code repositories. Secrets in public code repositories are nothing new. What’s surprising is the fact that after years of research, numerous security incidents, millions of dollars in bug bounty hunters’ pockets, and general awareness of the risk, it is still painfully easy to find valid secrets in public repositories. — Read More
New AI Jailbreak Bypasses Guardrails With Ease
Through progressive poisoning and manipulating an LLM’s operational context, many leading AI models can be tricked into providing almost anything – regardless of the guardrails in place.
From their earliest days, LLMs have been susceptible to jailbreaks – attempts to get the gen-AI model to do something or provide information that could be harmful. The LLM developers have made jailbreaks more difficult by adding more sophisticated guardrails and content filters, while attackers have responded with progressively more complex and devious jailbreaks.
One of the more successful jailbreak types has seen the evolution of multi turn jailbreaks involving conversational rather than single entry prompts. A new one, dubbed Echo Chamber, has emerged today. — Read More
The Role of AI and Compliance in Modern Risk Management: ShowMeCon 2025
When people think of St. Louis, it’s often the Gateway Arch or the Cardinals that come to mind. Just across the Missouri River is one of the “Show Me” state’s oldest European settlements, dating back to 1769, St. Charles. Front just a stone’s throw from where Lewis and Clark set off on their famous expedition, something more than baseball statistics, historical trivia, or architectural wonders was being discussed in early June: security, compliance, and risk, at ShowMeCon 2025.
Around 400 practitioners gathered for two full days of sessions, villages, and a CTF run by MetaCTF. There was much discussion of the industry’s distinction between controls, policies, and security. A general theme emerged that real security demands context, rigor, and adaptive posture, not just checking the box.Here are just a few highlights from the 2025 edition of ShowMeCon. — Read More
Starting a Security Program from Scratch (or re-starting)
I’ve had a number of requests to write a post about how to start and grow a new security program – or a substantial reassessment and rebuild of an existing program.
This is a difficult one to write because, as you all know, there is no one size fits all approach. Starting from scratch in a 10 person startup is very different from (re-)building a security program in a more established organization. What I’ve tried to do here, instead, is to develop a framework and step by step guide to apply to pretty much any type of organization. It might be that in applying this you only need, for your risk and stage of development, to go halfway in the various steps. Some time later, as your organization grows in size, stature or criticality then you might need to do the whole thing.
There are 4 phases of maturity each with their own steps. But basically it’s all about (1) start facing in the right direction, (2) getting the basics done, (3) making those basics more routine / sustainable and then, if you need to (4) making it much more advanced / strategic. — Read More