Many deployed learned models are black boxes: given input, returns output. Internal information about the model, such as the architecture, optimisation procedure, or training data, is not disclosed explicitly as it might contain proprietary information or make the system more vulnerable. This work shows that such attributes of neural networks can be exposed from a sequence of queries. Read More
Daily Archives: March 16, 2019
Stealing Machine Learning Models via Prediction APIs
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. Read More
Stealing Hyperparameters in Machine Learning
Hyperparameters are critical in machine learning, as different hyperparameters often result in models with significantly different performance. Hyperparameters may be deemed confidential because of their commercial value and the confidentiality of the proprietary algorithms that the learner uses to learn them. In this work, we propose attacks on stealing the hyperparameters that are learnt by a learner. Read More
Confidentiality and Privacy Threats in Machine Learning
New threat models in Machine Learning. Read More
A Hybrid Approach to Privacy-Preserving Federated Learning
Training machine learning models often requires data from multiple parties. However, in some cases, data owners cannot share their data due to legal or privacy constraints but would still benefit from training a model jointly with multiple parties. Federated learning has arisen as an alternative to allow for the collaborative training of models without the sharing of raw data. However, attacks in the literature have demonstrated that simply maintaining data locally during training processes does not provide strong enough privacy guarantees. We need a federated learning system capable of preventing inference over the messages exchanged between parties during training as well as the final, trained model. Read More
Split learning for health: Distributed deep learning without sharing raw patient data
Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN. Read More
A little-known AI method can train on your health data without threatening your privacy
Machine learning has great potential to transform disease diagnosis and detection, but it’s been held back by patients’ reluctance to give up access to sensitive information. Read More
A new AI method can train on medical records without revealing patient data
When Google announced that it would absorb DeepMind’s health division, it sparked a major controversy over data privacy. Though DeepMind confirmed that the move wouldn’t actually hand raw patient data to Google, just the idea of giving a tech giant intimate, identifying medical records made people queasy. This problem with obtaining lots of high-quality data has become the biggest obstacle to applying machine learning in medicine. Read More
SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization
A novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights. This produces a treestructured network that involves no connection between branched subtrees of semantically disparate class groups. SplitNet thus greatly reduces the number of parameters and required computations, and is also embarrassingly modelparallelizable at test time, since the evaluation for each subnetwork is completely independent except for the shared lower layer weights that can be duplicated over multiple processors, or assigned to a separate processor. Read More