Why Some Models Leak Data

Machine learning models use large amounts of data, some of which can be sensitive. If they’re not trained correctly, sometimes that data is inadvertently revealed.

… Models of real world data are often quite complex—this can improve accuracy, but makes them more susceptible to unexpectedly leaking information. Medical models have inadvertently revealed patients’ genetic markers. Language models have memorized credit card numbers. Faces can even be reconstructed from image models.

… Training models with differential privacy stops the training data from leaking by limiting how much the model can learn from any one data point. Differentially private models are still at the cutting edge of research, but they’re being packaged into machine learning frameworks, making them much easier to use. When it isn’t possible to train differentially private models, there are also tools that can measure how much data is the model memorizing.  Read More

#adversarial, #privacy, #model-attacks