Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size. In this work, we consider the problem of automatic curation of high-quality datasets for self-supervised pre-training. We posit that such datasets should be large, diverse and balanced, and propose a clustering-based approach for building ones satisfying all these criteria. Our method involves successive and hierarchical applications of k-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. Extensive experiments on three different data domains including web-based images, satellite images and text show that features trained on our automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data. Code is available at this https URL. — Read More

#self-supervised

Self-Taught AI Shows Similarities to How the Brain Works

Self-supervised learning allows a neural network to figure out for itself what matters. The process might be what makes our own brains so successful.

For a decade now, many of the most impressive artificial intelligence systems have been taught using a huge inventory of labeled data. An image might be labeled “tabby cat” or “tiger cat,” for example, to “train” an artificial neural network to correctly distinguish a tabby from a tiger. The strategy has been both spectacularly successful and woefully deficient.

Such “supervised” training requires data laboriously labeled by humans, and the neural networks often take shortcuts, learning to associate the labels with minimal and sometimes superficial information. For example, a neural network might use the presence of grass to recognize a photo of a cow, because cows are typically photographed in fields.

“We are raising a generation of algorithms that are like undergrads [who] didn’t come to class the whole semester and then the night before the final, they’re cramming,” said Alexei Efros, a computer scientist at the University of California, Berkeley. “They don’t really learn the material, but they do well on the test.” Read More

#human, #self-supervised

Self-taught Learning: Transfer Learning from Unlabeled Data

We present a new machine learning framework called “self-taught learning” for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation. Read More

#self-supervised

Introducing the First Self-Supervised Algorithm for Speech, Vision and Text

  • We’re introducing data2vec, the first high-performance self-supervised algorithm that learns in the same way for speech, vision and text.
  • With data2vec, we’re closer to building machines that learn about different aspects of the world around them without having to rely on labeled data. Read More
#self-supervised, #big7

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements. Read More

#image-recognition, #self-supervised

Facebook details self-supervised AI that can segment images and videos

Facebook today announced that it developed an algorithm in collaboration with Inria called DINO that enables the training of transformers, a type of machine learning model, without labeled training data. The company claims it sets a new state-of-the-art among unlabeled data training methods and leads to a model that can discover and segment objects in an image or video without a specific objective.

Segmenting objects is used in tasks ranging from swapping out the background of a video chat to teaching robots that navigate through a factory. But it’s considered among the hardest challenges in computer vision because it requires an AI to understand what’s in an image. Read More

#big7, #frameworks, #self-supervised

Big Self-Supervised Models Advance Medical Image Classification

Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but has received limited attention in medical image analysis. This paper studies the effectiveness of self-supervised learning as a pretraining strategy for medical image classification. We conduct experiments on two distinct tasks: dermatology skin condition classification from digital camera images and multi-label chest X-ray classification, and demonstrate that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images significantly improves the accuracy of medical image classifiers. We introduce a novel Multi-Instance Contrastive Learning (MICLe) method that uses multiple images of the underlying pathology per patient case, when available, to construct more informative positive pairs for self-supervised learning. Combining our contributions, we achieve an improvement of 6.7% in top-1 accuracy and an improvement of 1.1% in mean AUC on dermatology and chest X-ray classification respectively, outperforming strong supervised baselines pretrained on ImageNet. In addition, we show that big self-supervised models are robust to distribution shift and can learn efficiently with a small number of labeled medical images. Read More

#image-recognition, #self-supervised

Multi-modal Self-Supervision from Generalized Data Transformations

The recent success of self-supervised learning can be largely attributed to content-preserving transformations, which can be used to easily induce invariances. While transformations generate positive sample pairs in contrastive loss training, most recent work focuses on developing new objective formulations, and pays rela-tively little attention to the transformations themselves. In this paper, we introduce the framework of Generalized Data Transformations to (1) reduce several recent self-supervised learning objectives to a single formulation for ease of comparison,analysis, and extension, (2) allow a choice between being invariant or distinctive to data transformations, obtaining different supervisory signals, and (3) derive the conditions that combinations of transformations must obey in order to lead to well-posed learning objectives. This framework allows both invariance and distinctiveness to be injected into representations simultaneously, and lets us systematically explore novel contrastive objectives. We apply it to study multi-modal self-supervision for audio-visual representation learning from unlabelled videos,improving the state-of-the-art by a large margin, and even surpassing supervised pretraining. We demonstrate results on a variety of downstream video and audio classification and retrieval tasks, on datasets such as HMDB-51, UCF-101,DCASE2014, ESC-50 and VGG-Sound. In particular, we achieve new state-of-the-art accuracies of 72.8% on HMDB-51 and 95.2% on UCF-101. Read More

#image-recognition, #self-supervised

Uncovering the structure of clinical EEG signals with self-supervised learning

Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data,such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels.Approach.We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically,we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection.We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches.Main results.Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects.Significance.We demonstrate the benefit of self-supervised learning approaches on EEG data. Our results suggest that SSL may pave the way to a wider use of deep learning models on EEG data. Read More

#self-supervised

Not All Unlabeled Data are Equal:Learning to Weight Data in Semi-supervised Learning

Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples,i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights – as done in prior work – is no longer possible. Instead, we adjust those weights via an algorithm based on the influence function, a measure of a model’s dependency on one training example. To make the approach efficient, we propose a fast and effective approximation of the influence function. We demonstrate that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks. Read More

#self-supervised