Microsoft Releases PyRIT – A Red Teaming Tool for Generative AI

Microsoft has released an open access automation framework called PyRIT (short for Python Risk Identification Tool) to proactively identify risks in generative artificial intelligence (AI) systems.

The red teaming tool is designed to “enable every organization across the globe to innovate responsibly with the latest artificial intelligence advances,” Ram Shankar Siva Kumar, AI red team lead at Microsoft, said.

The company said PyRIT could be used to assess the robustness of large language model (LLM) endpoints against different harm categories such as fabrication (e.g., hallucination), misuse (e.g., bias), and prohibited content (e.g., harassment).

It can also be used to identify security harms ranging from malware generation to jailbreaking, as well as privacy harms like identity theft. — Read More

#assurance

Building a World Where Data Privacy Exists Online

Data is valuable — something that companies like Facebook, Google and Amazon realized far earlier than most consumers did. But computer scientists have been working on alternative models, even as the public has grown weary of having their data used and abused.

Dawn Song, a professor at the University of California, Berkeley, and one of the world’s foremost experts in computer security and trustworthy artificial intelligence, envisions a new paradigm in which people control their data and are compensated for its use by corporations. Read More

#adversarial, #assurance, #privacy

Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges

Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fueled by a vision of ML applicability extending to healthcare, transportation, defense and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research. Read More

#accuracy, #assurance, #performance

How New A.I. Is Making the Law’s Definition of Hacking Obsolete

Imagine you’re cruising in your new Tesla, autopilot engaged. Suddenly you feel yourself veer into the other lane, and you grab the wheel just in time to avoid an oncoming car. When you pull over, pulse still racing, and look over the scene, it all seems normal. But upon closer inspection, you notice a series of translucent stickers leading away from the dotted lane divider. And to your Tesla, these stickers represent a non-existent bend in the road that could have killed you.

… As more machines become artificially intelligent, computer scientists are learning that A.I. can be manipulated into perceiving the world in wrong, sometimes dangerous ways. And because these techniques “trick” the system instead of “hacking” it, federal laws and security standards may not protect us from these malicious new behaviors — and the serious consequences they can have. Read More

#assurance

MobilBye: Attacking ADAS with Camera Spoofing

Advanced driver assistance systems (ADASs) were devel-oped to reduce the number of car accidents by issuing driveralert or controlling the vehicle. In this paper, we tested therobustness of Mobileye, a popular external ADAS. We injectedspoofed traffic signs into Mobileye to assess the influence ofenvironmental changes (e.g., changes in color, shape, projectionspeed, diameter and ambient light) on the outcome of an attack.To conduct this experiment in a realistic scenario, we used adrone to carry a portable projector which projected the spoofedtraffic sign on a driving car. Our experiments show that it ispossible to fool Mobileye so that it interprets the drone carriedspoofed traffic sign as a real traffic sign. Read More

#assurance, #fake

Adversarial Reprogramming of Neural Networks

Deep neural networks are susceptible to adversarial attacks. In computer vision,well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead reprogram the target model to perform a task chosen by the attacker—without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary—even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model. Read More

#assurance

Natural Adversarial Examples

We introduce natural adversarial examples – real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. We curate 7,500 natural adversarial examples and release them in an ImageNet classifier test set that we call IMAGENET-A. This dataset serves as a new way to measure classifier robustness. Like `p adversarial examples, IMAGENET-A examples successfully transfer to unseen or black-box classifiers. For example, on IMAGENET-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%. Recovering this accuracy is not simple because IMAGENET-A examples exploit deep flaws in current classifiers including their over-reliance on color, texture, and background cues. We observe that popular training techniques for improving robustness have little effect, but we show that some architectural changes can enhance robustness to natural adversarial examples. Future research is required to enable robust generalization to this hard ImageNet test set. Read More

#assurance

How can attackers abuse artificial intelligence?

Artificial intelligence (AI) is rapidly finding applications in nearly every walk of life. Self-driving cars, social media networks, cybersecurity companies, and everything in between uses it.

But a new report published by the SHERPA consortium – an EU project studying the impact of AI on ethics and human rights – finds that while human attackers have access to machine learning techniques, they currently focus most of their efforts on manipulating existing AI systems for malicious purposes instead of creating new attacks that would use machine learning. Read More

#assurance, #cyber

Superhuman AI for multiplayer poker

In recent years there have been great strides in artificial intelligence (AI), with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as such a challenge problem. Past successes in such benchmarks, including poker, have been limited to two-player games. However, poker in particular is traditionally played with more than two players. Multiplayer games present fundamental additional issues beyond those in two-player games, and multiplayer poker is a recognized AI milestone. In this paper we present Pluribus, an AI that we show is stronger than top human professionals in six-player no-limit Texas hold’em poker, the most popular form of poker played by humans. Read More

#assurance, #human, #self-supervised

MTDeep: Boosting the Security of Deep Neural Nets Against Adversarial Attacks with Moving Target Defense

Recent works on gradient based attacks and universal perturbations can adversarially modify images to bring down the accuracy of state-of-the-art classification techniques based on deep neural networks to as low as 10% on popular datasets like MNIST and ImageNet. The design of general defense strategies against a wide range of such attacks remains a challenging problem. In this paper, we derive inspiration from recent advances in the fields of cybersecurity and multi-agent systems and propose to use the concept of Moving Target Defense (MTD) for increasing the robustness of a set of deep networks against such adversarial attacks. To this end, we formalize and exploit the notion of differential immunity of an ensemble of networks to specific attacks. To classify an input image, a trained network is picked from this set of networks by formulating the interaction between a Defender (who hosts the classification networks) and their (Legitimate and Malicious) Users as a repeated Bayesian Stackelberg Game (BSG). We empirically show that our approach, MTDeep reduces misclassification on perturbed images for MNIST and ImageNet datasets while maintaining high classification accuracy on legitimate test images. Lastly, we demonstrate that our framework can be used in conjunction with any existing defense mechanism to provide more resilience to adversarial attacks than those defense mechanisms by themselves. Read More

#assurance