A Short Tutorial on Differential Privacy

Disturbing Headlines and Paper Titles: “A Face Is Exposed for AOL Searcher No. 4417749” [Barbaro & Zeller ’06], “Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)” [Narayanan & Shmatikov ’08], “Matching Known Patients to Health Records in Washington State Data” [Sweeney ’13] , “Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study” [Sweeney et al. ’13] — … and many others/ In general, removing identifiers and applying anonymization heuristics is not always enough! A good brief on differential privacy and its use. Read More

#homomorphic-encryption, #privacy

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address both issues in an integrated manner. In contrast to the whitebox approach adopted by previous work, we revisit and use the classical technique of output perturbation to devise a novel “bolt-on” approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the L2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets Read More

#homomorphic-encryption, #privacy

Stochastic gradient descent with differentially private updates

Differential privacy is a recent framework for computation on sensitive data, which has shown considerable promise in the regime of large datasets. Stochastic gradient methods are a popular approach for learning in the data-rich regime because they are computationally tractable and scalable. In this paper, we derive differentially private versions of stochastic gradient descent, and test them empirically. Our results show that standard SGD experiences high variability due to differential privacy, but a moderate increase in the batch size can improve performance significantly. Read More

#homomorphic-encryption, #privacy