
Tag Archives: Federated Learning
Practical Secure Aggregation for Privacy-Preserving Machine Learning
We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner (i.e. without learning each user’s individual contribution), and can be used, for example,in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and active adversary settings,and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers 1.73×communication expansion for210users and220-dimensional vectors, and 1.98×expansion for214users and224-dimensional vectors over sending data in the clear. Read More
A Differentially Private Kernel Two-Sample Test
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However,raw data samples can expose sensitive information about individuals who participate in scientific studies,which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings. Read More
Differentially Private Federated Learning: A Client Level Perspective
Federated learning is a recent advance in privacy protection. In this context, a trusted curator aggregates parameters optimized in decentralized fashion by multiple clients. The resulting model is then distributed back to all clients, ultimately converging to a joint representative model without explicitly having to share the data. However, the protocol is vulnerable to differential attacks, which could originate from any party contributing during federated optimization. In such an at-tack, a client’s contribution during training and information about their data set is revealed through analyzing the distributed model. We tackle this problem and propose an algorithm for client sided differential privacy preserving federated optimization. The aim is to hide clients’ contributions during training, balancing the tradeoff between privacy loss and model performance. Empirical studies suggest that given a sufficiently large number of participating clients, our proposed procedure can maintain client-level differential privacy at only a minor cost in model performance. Read More
Federated Machine Learning: Concept and Applications
Today’s AI still faces two major challenges. One is that in most industries, data exists in the form of isolatedislands. The other is the strengthening of data privacy and security. We propose a possible solution to thesechallenges: secure federated learning. Beyond the federated learning framework first proposed by Google in2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federatedlearning, vertical federated learning and federated transfer learning. We provide definitions, architectures andapplications for the federated learning framework, and provide a comprehensive survey of existing workson this subject. In addition, we propose building data networks among organizations based on federatedmechanisms as an effective solution to allow knowledge to be shared without compromising user privacy. Read More
Federated Learning with Non-IID Data
Federated learning enables resource-constrained edge compute devices, such asmobile phones and IoT devices, to learn a shared model for prediction, while keep-ing the training data local. This decentralized approach to train models providesprivacy, security, regulatory and economic benefits. In this work, we focus on thestatistical challenge of federated learning when local data is non-IID. We first showthat the accuracy of federated learning reduces significantly, by up to ~55% forneural networks trained for highly skewed non-IID data, where each client devicetrains only on a single class of data. We further show that this accuracy reductioncan be explained by the weight divergence, which can be quantified by the earthmover’s distance (EMD) between the distribution over classes on each device andthe population distribution. As a solution, we propose a strategy to improve trainingon non-IID data by creating a small subset of data which is globally shared betweenall the edge devices. Experiments show that accuracy can be increased by ~30%for the CIFAR-10 dataset with only 5% globally shared data. Read More
Federated Learning via Over-the-Air Computation
The stringent requirements for low-latency andprivacy of the emerging high-stake applications with intelligentdevices such as drones and smart vehicles make the cloudcomputing inapplicable in these scenarios. Instead,edge machinelearningbecomes increasingly attractive for performing trainingand inference directly at network edges without sending data to acentralized data center. This stimulates a nascent field termed asfederated learningfor training a machine learning model on com-putation, storage, energy and bandwidth limited mobile devicesin a distributed manner. To preserve data privacy and addressthe issues of unbalanced and non-IID data points across differentdevices, the federated averaging algorithm has been proposed forglobal model aggregation by computing the weighted averageof locally updated model at each selected device. However, thelimited communication bandwidth becomes the main bottleneckfor aggregating the locally computed updates. We thus proposea novelover-the-air computationbased approach for fast globalmodel aggregation via exploring the superposition property ofa wireless multiple-access channel. This is achieved by jointdevice selection and beamforming design, which is modeled asa sparse and low-rank optimization problem to support efficientalgorithms design. To achieve this goal, we provide a difference-of-convex-functions (DC) representation for the sparse and low-rank function to enhance sparsity and accurately detect thefixed-rank constraint in the procedure of device selection.A DCalgorithm is further developed to solve the resulting DC programwith global convergence guarantees. The algorithmic advantagesand admirable performance of the proposed methodologies aredemonstrated through extensive numerical results. Read More
Towards federated learning at scale: system design
Federated Learning (FL) (McMahan et al., 2017) is a dis-tributed machine learning approach which enables trainingon a large corpus of decentralized data residing on deviceslike mobile phones. FL is one instance of the more generalapproach of “bringing the code to the data, instead of thedata to the code” and addresses the fundamental problemsof privacy, ownership, and locality of data. The generaldescription of FL has been given by McMahan & Ramage(2017), and its theory has been explored in Koneˇcn ́y et al.(2016a); McMahan et al. (2017; 2018). Read More
Multi-objective Evolutionary Federated Learning
Federated learning is an emerging technique used to prevent the leakage of private information. Unlike centralized learning that needs to collect data from users and store them collectively on a cloud server, federated learning makes it possible to learn a global model while the data are distributed on the users’ devices. However, compared with the traditional centralized approach, the federated setting consumes considerable communication resources of the clients, which is indispensable for updating global models and prevents this technique from being widely used. In this paper, we aim to optimize the structure o f the neural network models in federated learning using a multiobjective evolutionary algorithm to simultaneously minimize the communication costs and the global model test errors. A scalable method for encoding network connectivity is adapted to federated learning to enhance the efficiency in evolving deep neural networks. Experimental results on both multilayer perceptrons and convolutional neural networks indicate that the proposed optimization method is able to find optimized neural network models that can not only significantly reduce communication costs but also improve the learning performance of federated learning compared with the standard fully connected neural networks . Read More
An introduction to Federated Learning
Federated learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs.
This article is about the business case for federated learning. We’ll talk about how it works at a conceptual level, and then focus on the applications and use cases. Read More