Federated learning enables resource-constrained edge compute devices, such asmobile phones and IoT devices, to learn a shared model for prediction, while keep-ing the training data local. This decentralized approach to train models providesprivacy, security, regulatory and economic benefits. In this work, we focus on thestatistical challenge of federated learning when local data is non-IID. We first showthat the accuracy of federated learning reduces significantly, by up to ~55% forneural networks trained for highly skewed non-IID data, where each client devicetrains only on a single class of data. We further show that this accuracy reductioncan be explained by the weight divergence, which can be quantified by the earthmover’s distance (EMD) between the distribution over classes on each device andthe population distribution. As a solution, we propose a strategy to improve trainingon non-IID data by creating a small subset of data which is globally shared betweenall the edge devices. Experiments show that accuracy can be increased by ~30%for the CIFAR-10 dataset with only 5% globally shared data. Read More