In the last two years, more than 200 papers have been written on how Machine Learning (ML) can fail because of adversarial attacks on the algorithms and data; this number balloons if we were to incorporate non-adversarial failure modes. The spate of papers has made it difficult for ML practitioners, let alone engineers, lawyers and policymakers, to keep up with the attacks against and defenses of ML systems. However, as these systems become more pervasive, the need to understand how they fail, whether by the hand of an adversary or due to the inherent design of a system, will only become more pressing. The purpose of this document is to jointly tabulate both the of these failure modes in a single place.
— Intentional failures wherein the failure is caused by an active adversary attempting to subvert the system to attain her goals – either to misclassify the result, infer private training data, or to steal the underlying algorithm.
— Unintentional failures wherein the failure is because an ML system produces a formally correct but completely unsafe outcome.
Read More
Tag Archives: Adversarial
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to be-nign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting. Read More
#adversarialBuilding a World Where Data Privacy Exists Online
Data is valuable — something that companies like Facebook, Google and Amazon realized far earlier than most consumers did. But computer scientists have been working on alternative models, even as the public has grown weary of having their data used and abused.
Dawn Song, a professor at the University of California, Berkeley, and one of the world’s foremost experts in computer security and trustworthy artificial intelligence, envisions a new paradigm in which people control their data and are compensated for its use by corporations. Read More
Results of the NIPS Adversarial Vision Challenge 2018
The winners of the NIPS Adversarial Vision Challenge 2018 have been determined. Overall more than 400 participants submitted more than 3000 models and attacks. This year the competition focused on real-world scenarios in which attacks have low-volume query access to models (up to 1000 queries per sample). The models only returned their final decision but not gradients nor confidence scores. This mimics a typical threat scenario for deployed Machine Learning system and was supposed to push the development of efficient decision-based attacks as well as more robust models. Read More
Military artificial intelligence can be easily and dangerously fooled
Last March, Chinese researchers announced an ingenious and potentially devastating attack against one of America’s most prized technological assets—a Tesla electric car.
The team, from the security lab of the Chinese tech giant Tencent, demonstrated several ways to fool the AI algorithms on Tesla’s car. By subtly altering the data fed to the car’s sensors, the researchers were able to bamboozle and bewilder the artificial intelligence that runs the vehicle. Read More
Adversarial Attacks on Deep Neural Networks: an Overview
Deep Neural Networks are highly expressive machine learning networks. Researchers have found that it is far too easy to fool them with an imperceivable, but carefully constructed nudge in the input. Adversarial training looks to defend against attacks by pretending to be the attacker, generating a number of adversarial examples against your own network, and then explicitly training the model to not be fooled by them. Defensive distillation looks to train a secondary model whose surface is smoothed in the directions an attacker will typically try to exploit, making it difficult for them to discover adversarial input tweaks that lead to incorrect categorization. Read More
Blind Spots in AI Just Might Help Protect Your Privacy
Machine learning, for all its benevolent potential to detect cancers and create collision-proof self-driving cars, also threatens to upend our notions of what’s visible and hidden. It can, for instance, enable highly accurate facial recognition, see through the pixelation in photos, and even—as Facebook’s Cambridge Analytica scandal showed—use public social media data to predict more sensitive traits like someone’s political orientation.
Those same machine-learning applications, however, also suffer from a strange sort of blind spot that humans don’t—an inherent bug that can make an image classifier mistake a rifle for a helicopter, or make an autonomous vehicle blow through a stop sign. Those misclassifications, known as adversarial examples, have long been seen as a nagging weakness in machine-learning models. Read More