Tutorial on Secure Multi-Part Computation

Good briefing as backgrounder —- Read More

#homomorphic-encryption, #privacy

From Keys to Databases – Real-World Applications of Secure Multi-Party Computation

We discuss the widely increasing range of applications of a cryptographic technique called Multi-Party Computation. For many decades this was perceived to be of purely theoretical interest, but now it has started to find application in a number of use cases. We highlight in this paper a number of these, ranging from securing small high value items such as cryptographic keys, through to securing an entire database. Read More

#homomorphic-encryption, #privacy

Secure Multiparty Computation

Introduction — High level briefing of concepts, challenges, and real-world uses. Read More

#homomorphic-encryption, #privacy

A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification

We consider the problem of developing privacy preserving machine learning algorithms in a distributed multiparty setting. Here different parties own different parts of a data set, and the goal is to learn a classifier from the entire data set without any party revealing any information about the individual data points it owns. Pathak et al [7] recently proposed a solution to this problem in which each party learns a local classifier from its own data, and a third party then aggregates these classifiers in a privacy-preserving manner using a cryptographic scheme. The generalization performance of their algorithm is sensitive to the number of parties and the relative fractions of data owned by the different parties. In this paper, we describe a new differentially private algorithm for the multiparty setting that uses a stochastic gradient descent based procedure to directly optimize the overall multiparty objective rather than combining classifiers learned from optimizing local objectives. The algorithm achieves a slightly weaker form of differential privacy than that of [7], but provides improved generalization guarantees that do not depend on the number of parties or the relative sizes of the individual data sets. Experimental results corroborate our theoretical findings. Read More

#homomorphic-encryption, #privacy

Differentially Private Machine Learning

Theory, Algorithms, and Applications (UCSD & Rutgers Tutorial) Read More

#homomorphic-encryption, #privacy

A Short Tutorial on Differential Privacy

Disturbing Headlines and Paper Titles: “A Face Is Exposed for AOL Searcher No. 4417749” [Barbaro & Zeller ’06], “Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)” [Narayanan & Shmatikov ’08], “Matching Known Patients to Health Records in Washington State Data” [Sweeney ’13] , “Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study” [Sweeney et al. ’13] — … and many others/ In general, removing identifiers and applying anonymization heuristics is not always enough! A good brief on differential privacy and its use. Read More

#homomorphic-encryption, #privacy

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address both issues in an integrated manner. In contrast to the whitebox approach adopted by previous work, we revisit and use the classical technique of output perturbation to devise a novel “bolt-on” approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the L2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets Read More

#homomorphic-encryption, #privacy

Stochastic gradient descent with differentially private updates

Differential privacy is a recent framework for computation on sensitive data, which has shown considerable promise in the regime of large datasets. Stochastic gradient methods are a popular approach for learning in the data-rich regime because they are computationally tractable and scalable. In this paper, we derive differentially private versions of stochastic gradient descent, and test them empirically. Our results show that standard SGD experiences high variability due to differential privacy, but a moderate increase in the batch size can improve performance significantly. Read More

#homomorphic-encryption, #privacy

A Hybrid Approach to Privacy-Preserving Federated Learning

Training machine learning models often requires data from multiple parties. However, in some cases, data owners cannot share their data due to legal or privacy constraints but would still benefit from training a model jointly with multiple parties. Federated learning has arisen as an alternative to allow for the collaborative training of models without the sharing of raw data. However, attacks in the literature have demonstrated that simply maintaining data locally during training processes does not provide strong enough privacy guarantees. We need a federated learning system capable of preventing inference over the messages exchanged between parties during training as well as the final, trained model. Read More

#federated-learning, #privacy, #split-learning

Split learning for health: Distributed deep learning without sharing raw patient data

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN. Read More

#privacy, #split-learning, #splitnn