An introduction to Federated Learning

Federated learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs.

This article is about the business case for federated learning. We’ll talk about how it works at a conceptual level, and then focus on the applications and use cases. Read More

#federated-learning, #machine-learning, #split-learning

Federated learning: distributed machine learning with data locality and privacy

We’re excited to release Federated Learning, the latest report and prototype from Cloudera Fast Forward Labs.

Federated learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs. Read More

#federated-learning, #machine-learning, #split-learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. Read More

#distributed-learning, #machine-learning, #split-learning

A Hybrid Approach to Privacy-Preserving Federated Learning

Training machine learning models often requires data from multiple parties. However, in some cases, data owners cannot share their data due to legal or privacy constraints but would still benefit from training a model jointly with multiple parties. Federated learning has arisen as an alternative to allow for the collaborative training of models without the sharing of raw data. However, attacks in the literature have demonstrated that simply maintaining data locally during training processes does not provide strong enough privacy guarantees. We need a federated learning system capable of preventing inference over the messages exchanged between parties during training as well as the final, trained model. Read More

#federated-learning, #privacy, #split-learning

Split learning for health: Distributed deep learning without sharing raw patient data

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN. Read More

#privacy, #split-learning, #splitnn

A little-known AI method can train on your health data without threatening your privacy

Machine learning has great potential to transform disease diagnosis and detection, but it’s been held back by patients’ reluctance to give up access to sensitive information. Read More

#privacy, #split-learning, #splitnn

A new AI method can train on medical records without revealing patient data

When Google announced that it would absorb DeepMind’s health division, it sparked a major controversy over data privacy. Though DeepMind confirmed that the move wouldn’t actually hand raw patient data to Google, just the idea of giving a tech giant intimate, identifying medical records made people queasy. This problem with obtaining lots of high-quality data has become the biggest obstacle to applying machine learning in medicine. Read More

#privacy, #split-learning, #splitnn

SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization

A novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights. This produces a treestructured network that involves no connection between branched subtrees of semantically disparate class groups. SplitNet thus greatly reduces the number of parameters and required computations, and is also embarrassingly modelparallelizable at test time, since the evaluation for each subnetwork is completely independent except for the shared lower layer weights that can be duplicated over multiple processors, or assigned to a separate processor. Read More

#privacy, #split-learning, #splitnn