Testing The Limits Of Transfer Learning In Natural Language Processing

It has become increasingly common to pre-train models to develop general-purpose abilities and knowledge that can then be “transferred” to downstream tasks.

In applications of transfer learning to computer vision, pre-training is typically done via supervised learning on a large labelled dataset like ImageNet. In contrast, modern techniques for transfer learning in NLP often pre-train using unsupervised learning on unlabeled data.

In spite of being widely popular there are still few pressing questions bothering transfer learning in ML. Read More

#transfer-learning

Once-For-All: Train One Network And Specialize It For Efficient Deployment

We address the challenging problem of efficient inference across many devices and resource constraints, especially on edge devices. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing CO2 emission as much as 5 cars’ lifetime Strubell et al. (2019)) thus unscalable. In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost. We can quickly get a specialized sub-network by selecting from the OFA network without additional training. To efficiently train OFA networks, we also propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning (depth, width, kernel size, and resolution). It can obtain a surprisingly large number of subnetworks (> 1019) that can fit different hardware platforms and latency constraints while maintaining the same level of accuracy as training independently. On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5× faster than MobileNetV3, 2.6× faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top-1 accuracy under the mobile setting (<600M MACs). OFA is the winning solution for the 3rd Low Power Computer Vision Challenge (LPCVC), DSP classification track and the 4th LPCVC, both classification track and detection track. Code and 50 pre-trained models (for many devices & many latency constraints) are released at https://github.com/mit-han-lab/once-for-all. Read More

#transfer-learning

China’s State News Agency Introduces New Artificial Intelligence Anchor

Xinhua, the Chinese state news agency, has released its latest artificial intelligence (AI) 3D news anchor. The AI anchor joins a list of growing virtual presenters that are being developed by the agency.

The AI news anchor is named Xin Xiaowei, and it is modeled after Zhao Wanwei, who is one of the agency’s human news presenters. 

According to the search engine Sogou, who co-developed the technology, the AI anchor utilizes “multi-modal recognition and synthesis, facial recognition and animation and transfer learning.” Read More

#nlp, #robotics, #transfer-learning

Learning Deep Neural Networks incrementally forever

The hallmark of human intelligence is the capacity to learn. A toddler has comparable aptitudes to reason about space, quantities, or causality than other ape species (source). The difference of our cousins and us is the ability to learn from others.

The recent deep learning hype aims to reach the Artificial General Intelligence (AGI): an AI that would express (supra-)human-like intelligence. Unfortunately current deep learning models are flawed in many ways: one of them is that they are unable to learn continuously as human does through years of schooling, and so on. Read More

#federated-learning, #transfer-learning

Incremental learning algorithms and applications

Incremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios such as learning in changing environments, model personalisation, or lifelong learning, and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications which emerged in the last years. Read More

#machine-learning, #privacy, #transfer-learning

Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing

Deep convolutional neural network (DCNN) based supervised learning is a widely practiced approach for large-scale image classification. However, retraining these large networks to accommodate new, previously unseen data demands high computational time and energy requirements. Also, previously seen training samples may not be available at the time of retraining. We propose an efficient training methodology and incrementally growing DCNN to allow new classes to be learned while sharing part of the base network. Our proposed methodology is inspired by transfer learning techniques, although it does not forget previously learned classes. An updated network for learning new set of classes is formed using previously learned convolutional layers (shared from initial part of base network) with addition of few newly added convolutional kernels included in the later layers of the network. We evaluated the proposed scheme on several recognition applications. The classification accuracy achieved by our approach is comparable to the regular incremental learning approach (where networks are updated with new training samples only, without any network sharing), while achieving energy efficiency, reduction in storage requirements, memory access and training time. Read More

#machine-learning, #privacy, #transfer-learning

Transfer Incremental Learning Using Data Augmentation

Due to catastrophic forgetting, deep learning remains highly inappropriate when facing incremental learning of new classes and examples over time. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA combines transfer learning from a pre-trained Deep Neural Network (DNN) as feature extractor, a Nearest Class Mean (NCM) inspired classifier and majority vote using data augmentation on both training and test vectors. The obtained methodology allows learning new examples or classes on the fly with very limited computational and memory footprints. We perform experiments on challenging vision datasets and obtain performance significantly better than existing incremental counterparts. Read More

#machine-learning, #privacy, #transfer-learning

Using Transfer Learning to Introduce Generalization in Models

Researchers often try to capture as much information as they can, either by using existing architectures, creating new ones, going deeper, or employing different training methods. This paper compares different ideas and methods that are used heavily in Machine Learning to determine what works best. These methods are prevalent in various domains of Machine Learning, such as Computer Vision and Natural Language Processing (NLP). Read More

#machine-learning, #privacy, #transfer-learning

A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning

Humans have an inherent ability to transfer knowledge across tasks. What we acquire as knowledge while learning about one task, we utilize in the same way to solve related tasks. The more related the tasks, the easier it is for us to transfer, or cross-utilize our knowledge.

Conventional machine learning and deep learning algorithms, so far, have been traditionally designed to work in isolation. These algorithms are trained to solve specific tasks. The models have to be rebuilt from scratch once the feature-space distribution changes. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones. Read More

#machine-learning, #neural-networks, #privacy, #transfer-learning

Freeze Out: Accelerate training by progressively freezing layers

The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clocktime during training with 3% loss in accuracy for DenseNets, a 20% speed up without loss of accuracy for ResNets, and no improvement for VGG networks. Read More

#machine-learning, #neural-networks, #privacy, #transfer-learning