Deep neural networks require large training sets but suffer from high computational cost and long training times. Training on much smaller training sets while maintaining nearly the same accuracy would be very beneficial. In the few-shot learning setting, a model must learn a new class given only a small number of samples from that class. One-shot learning is an extreme form of few-shot learning where the model must learn a new class from a single example. We propose the ‘less than one’-shot learning task where models must learn N new classes given only M < N examples and we show that this is achievable with the help of soft labels. We use a soft-label generalization of the k-Nearest Neighbors classifier to explore the intricate decision landscapes that can be created in the ‘less than one’-shot learning setting. We analyze these decision landscapes to derive theoretical lower bounds for separating N classes using M < N soft-label samples and investigate the robustness of the resulting systems. Read More
Tag Archives: Performance
TernaryBERT: Quantization Meets Distillation
A BERTology contribution by Huawei
The ongoing trend of building ever larger models like BERT and GPT-3 has been accompanied by a complementary effort to reduce their size at little or no cost in accuracy. Effective models are built either via distillation (Pre-trained Distillation, DistilBERT, MobileBERT, TinyBERT), quantization (Q-BERT, Q8BERT) or parameter pruning.
On September 27, Huawei introduced TernaryBERT, a model that leverages both distillation and quantization to achieve accuracy comparable to the original BERT model with ~15x decrease in size. What is truly remarkable about TernaryBERT is that its weights are ternarized, i.e. have one of three values: -1, 0, or 1 (and can hence be stored in only two bits). Read More
Algorithms are not enough
The next breakthrough in AI requires a rethinking of our hardware
Today’s AI has a problem: it is expensive. Training Resnet-152, a modern computer vision model, is estimated to cost around 10 Billion floating point operations, which is dwarfed by modern language models. Training GPT-3, the recent natural language model from OpenAI, is estimated to cost 300 Billion Trillion floating point operations, which costs at least $5M on commercial GPUs. Compare this to the human brain, which can recognize faces, answer questions, and drive cars with as little as a banana and a cup of coffee. Read More
Google Teases Large Scale Reinforcement Learning Infrastructure
“The new infrastructure reduces the training time from eight hours down to merely one hour compared to a strong baseline.”
The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota 2 learns from batches of 2 million frames every 2 seconds. The infrastructure that handles RL at this scale should be not only good at collecting a large number of samples, but also be able to quickly iterate over these extensive amounts of samples during training. Read More
Super Learner versus Deep Neural Network
Deep Learning has taken a prominent place for tasks involving predictive modelling and pattern recognition. Deep Learning with its auto feature extraction and feed-forward methods gives the confidence to extract low-level features in order to identify high-level identities in big data applications. However, deep neural networks have drawbacks, which include many hyperparameters tuning together, slow convergence in smaller datasets and issues explaining why a particular decision was been made. While traditional machine learning algorithms can address these drawbacks, they are not typically capable of achieving the performance levels registered by deep neural networks. To improve performance, ensemble methods are used to combine multiple base learners. Read More
A critical analysis of metrics used for measuring progress in artificial intelligence
Comparing model performances on benchmark datasets is an integral part of measuring and driving progress in artificial intelligence. A model’s performance on a benchmark dataset is commonly assessed based on a single or a small set of performance metrics. While this enables quick comparisons, it may also entail the risk of inadequately reflecting model performance if the metric does not sufficiently cover all performance characteristics. Currently, it is unknown to what extent this might impact current benchmarking efforts. To address this question, we analysed the current landscape of performance metrics based on data covering 3867 machine learning model performance results from the web-based open platform ‘Papers with Code’. Our results suggest that the large majority of metrics currently used to evaluate classification AI benchmark tasks have properties that may result in an inadequate reflection of a classifiers’ performance, especially when used with imbalanced datasets. While alternative metrics that address problematic properties have been proposed, they are currently rarely applied as performance metrics in benchmarking tasks. Finally, we noticed that the reporting of metrics was partly inconsistent and partly unspecific, which may lead to ambiguities when comparing model performances. Read More
Can Your AI Differentiate Cats from Covid-19?Sample Efficient Uncertainty Estimation for Deep Learning Safety
Deep Neural Networks (DNNs) are known to make highly overconfident predictions on Out-of-Distribution data. Recent research has shown that uncertainty-aware models, such as, Bayesian Neural Network (BNNs) and Deep Ensembles,are less susceptible to this issue. However research in this area has been largely confined to the big data setting. In this work, we show that even state-of-the-art BNNs and Ensemble models tend to make overconfident predictions when the amount of training data is insufficient. This is especially concerning for emerging applications in physical sciences and healthcare where over-confident and inaccurate predictions can lead to disastrous consequences. To address the issue of accurate uncertainty (or confidence) estimation in the small-data regime, we propose a probabilistic generalization of the popular sample-efficient non-parametric kNN approach. We demonstrate the usefulness of the proposed approach on a real-world application of COVID-19 diagnosis from chest X-Rays by (a) highlighting surprising failures of existing techniques, and (b) achieving superior uncertainty quantification as compared to state-of-the-art. Read More
A solution to the learning dilemma for recurrent networks of spiking neurons
Recurrently connected networks of spiking neurons underlie the astounding information processing capabilities of the brain. Yet in spite of extensive research, how they can learn through synaptic plasticity to carry out complex network computations remains unclear. We argue that two pieces of this puzzle were provided by experimental data from neuroscience. A mathematical result tells us how these pieces need to be combined to enable biologically plausible online network learning through gradient descent, in particular deep reinforcement learning. This learning method–called e-prop–approaches the performance of backpropagation
through time (BPTT), the best-known method for training recurrent neural
networks in machine learning. In addition, it suggests a method for powerful on-chip learning in energy-efficient spike-based hardware for artificial intelligence. Read More
There’s plenty of room at the Top: What will drive computer performance after Moore’s law?
The doubling of the number of transistors on a chip every 2 years, a seemly inevitable trend that has been called Moore’s law, has contributed immensely to improvements in computer performance. However, silicon-based transistors cannot get much smaller than they are today, and other approaches should be explored to keep performance growing. Leiserson et al. review recent examples and argue that the most promising place to look is at the top of the computing stack, where improvements in software, algorithms, and hardware architecture can bring the much-needed boost. Read More
#performance, #nvidiaAssuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fueled by a vision of ML applicability extending to healthcare, transportation, defense and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research. Read More