We’ve developed sophisticated safety and security measures to prevent the misuse of our AI models. But cybercriminals and other malicious actors are actively attempting to find ways around them. Today, we’re releasing a report that details how.
Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale extortion operation using Claude Code, a fraudulent employment scheme from North Korea, and the sale of AI-generated ransomware by a cybercriminal with only basic coding skills. We also cover the steps we’ve taken to detect and counter these abuses. — Read More
Tag Archives: Cyber
Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents
Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks (e.g., prompt injection) and data-oriented threats (e.g., data exfiltration), time-of-check to time-of-use (TOCTOU) remain largely unexplored in this context. TOCTOU arises when an agent validates external state (e.g., a file or API response) that is later modified before use, enabling practical attacks such as malicious configuration swaps or payload injection. In this work, we present the first study of TOCTOU vulnerabilities in LLM-enabled agents. We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities. As countermeasures, we adapt detection and mitigation techniques from systems security to this setting and propose prompt rewriting, state integrity monitoring, and tool-fusing. Our study highlights challenges unique to agentic workflows, where we achieve up to 25% detection accuracy using automated detection methods, a 3% decrease in vulnerable plan generation, and a 95% reduction in the attack window. When combining all three approaches, we reduce the TOCTOU vulnerabilities from an executed trajectory from 12% to 8%. Our findings open a new research direction at the intersection of AI safety and systems security. — Read More
Detecting and countering misuse of AI: August 2025
We’ve developed sophisticated safety and security measures to prevent the misuse of our AI models. But cybercriminals and other malicious actors are actively attempting to find ways around them. Today, we’re releasing a report that details how.
Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale extortion operation using Claude Code, a fraudulent employment scheme from North Korea, and the sale of AI-generated ransomware by a cybercriminal with only basic coding skills. We also cover the steps we’ve taken to detect and counter these abuses. — Read More
When AIOps Become “AI Oops”: Subverting LLM-driven IT Operations via Telemetry Manipulation
AI for IT Operations (AIOps) is transforming how organizations manage complex software systems by automating anomaly detection, incident diagnosis, and remediation. Modern AIOps solutions increasingly rely on autonomous LLM-based agents to interpret telemetry data and take corrective actions with minimal human intervention, promising faster response times and operational cost savings.
In this work, we perform the first security analysis of AIOps solutions, showing that, once again, AI-driven automation comes with a profound security cost. We demonstrate that adversaries can manipulate system telemetry to mislead AIOps agents into taking actions that compromise the integrity of the infrastructure they manage. We introduce techniques to reliably inject telemetry data using error-inducing requests that influence agent behavior through a form of adversarial reward-hacking; plausible but incorrect system error interpretations that steer the agent’s decision-making. Our attack methodology, AIOpsDoom, is fully automated–combining reconnaissance, fuzzing, and LLM-driven adversarial input generation–and operates without any prior knowledge of the target system.
To counter this threat, we propose AIOpsShield, a defense mechanism that sanitizes telemetry data by exploiting its structured nature and the minimal role of user-generated content. Our experiments show that AIOpsShield reliably blocks telemetry-based attacks without affecting normal agent performance.
Ultimately, this work exposes AIOps as an emerging attack vector for system compromise and underscores the urgent need for security-aware AIOps design. — Read More
Are Cyber Defenders Winning?
On June 6, President Trump signed an executive order to “reprioritize cybersecurity efforts to protect America,” outlining a rough agenda “to improve the security and resilience of the nation’s information systems and networks.” As the administration develops a new cybersecurity strategy, it is essential that it understand and respond to a shifting trend in cyberspace: After a decades-long slump, defenders may finally be gaining the advantage.
In the 1970s, computers could be kept secure simply by being in locked rooms. But when these computers were connected to networks, attackers gained the advantage. Despite decades of defensive innovations since then, defenders’ efforts are routinely overwhelmed by the gains made by attackers. Successful defense is possible—but only with substantial resources and discipline.
Shifting “the advantage to its defenders and perpetually frustrating the forces that would threaten” cyberspace was a central goal of the Biden administration’s U.S. National Cybersecurity Strategy. But how will defenders—flooded with ambiguous statistics—know if they’re succeeding? — Read More
I Built an AI Hacker. It Failed Spectacularly
What happens when you give an LLM root access, infinite patience, and every hacking tool imaginable? Spoiler: It’s not what you’d expect.
It started out of pure curiosity. I’d been exploring LLMs and agentic AI, fascinated by their potential to reason, adapt, and automate complex tasks. I began to wonder: What if we could automate offensive security the same way we’ve automated customer support, coding, or writing emails?
That idea — ambitious in its simplicity — kept me up for weeks. So naturally, I did what any reasonable builder would do. I spent a couple of days building an autonomous AI pentester that could, in theory, outwork any human red teamer.
Spoiler alert: It didn’t work. But the journey taught me more about AI limitations, offensive security, and the irreplaceable human element in hacking than any textbook ever could. — Read More
The “Bubble” of Risk: Improving Assessments for Offensive Cybersecurity Agents
Most frontier models today undergo some form of safety testing, including whether they can help adversaries launch costly cyberattacks. But many of these assessments overlook a critical factor: adversaries can adapt and modify models in ways that expand the risk far beyond the perceived safety profile that static evaluations capture. At Princeton’s POLARIS Lab, we’ve previously studied how easily open-source or fine-tunable models can be manipulated to bypass safeguards. See, e.g., Wei et al. (2024), Qi et al. (2024), Qi et al. (2025), He et al. (2024). This flexibility means that model safety isn’t fixed: there is a “bubble” of risk defined by the degrees of freedom an adversary has to improve an agent. If a model provider offers fine-tuning APIs or allows repeated queries, it dramatically increases the attack surface. This is especially true when evaluating AI systems for risks related to their use in offensive cybersecurity attacks. In our recent research, Dynamic Risk Assessments for Offensive Cybersecurity Agents, we show that the risk “bubble” is larger, cheaper, and more dynamic than many expect. For instance, using only 8 H100 GPU-hours of compute—about $36—an adversary could improve an agent’s success rate on InterCode-CTF by over 40% using relatively simple methods. — Read More
XBOW’s AI-Powered Pentester Grabs Top Rank on HackerOne, Raises $75M to Grow Platform
We’re living in a new world now — one where it’s an AI-powered penetration tester that “now tops an eminent US security industry leaderboard that ranks red teamers based on reputation.” CSO Online reports:
On HackerOne, which connects organizations with ethical hackers to participate in their bug bounty programs, “Xbow” scored notably higher than 99 other hackers in identifying and reporting enterprise software vulnerabilities. It’s a first in bug bounty history, according to the company that operates the eponymous bot…
Xbow is a fully autonomous AI-driven penetration tester (pentester) that requires no human input, but, its creators said, “operates much like a human pentester” that can scale rapidly and complete comprehensive penetration tests in just a few hours. According to its website, it passes 75% of web security benchmarks, accurately finding and exploiting vulnerabilities. — Read More
Lean and Mean: How We Fine-Tuned a Small Language Model for Secret Detection in Code
We fine-tuned a small language model (Llama 3.2 1B) for detecting secrets in code, achieving 86% precision and 82% recall—significantly outperforming traditional regex-based methods. Our approach addresses the limitations of both regex patterns (limited context understanding) and large language models (high computational costs and privacy concerns) by creating a lean, efficient model that can run on standard CPU hardware. This blog post details our journey from data preparation to model training and deployment, demonstrating how Small Language Models can solve specific cybersecurity challenges without the overhead of massive LLMs.
This research is now one of Wiz’s core Secret Security efforts, adding fast, accurate secret detection as part of our solution. — Read More
Using AI to identify cybercrime masterminds
Online criminal forums, both on the public internet and on the “dark web” of Tor .onion sites, are a rich resource for threat intelligence researchers. The Sophos Counter Threat Unit (CTU) have a team of darkweb researchers collecting intelligence and interacting with darkweb forums, but combing through these posts is a time-consuming and resource-intensive task, and it’s always possible that things are missed.
As we strive to make better use of AI and data analysis, Sophos AI researcher Francois Labreche, working with Estelle Ruellan of Flare and the Université de Montréal and Masarah Paquet-Clouston of the Université de Montréal, set out to see if they could approach the problem of identifying key actors on the dark web in a more automated way. Their work, originally presented at the 2024 APWG Symposium on Electronic Crime Research, has recently been published as a paper. — Read More