Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its third year. New to the 2020 edition are several invited content contributions from a range of well-known and up-and-coming companies and research groups. Consider this Report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future. Read More
Daily Archives: October 7, 2020
AI Training Method Exceeds GPT-3 Performance with 99.9% Fewer Parameters
A team of scientists at LMU Munich have developed Pattern-Exploiting Training (PET), a deep-learning training technique for natural language processing (NLP) models. Using PET, the team trained a Transformer NLP model with 223M parameters that out-performed the 175B-parameter GPT-3 by over 3 percentage points on the SuperGLUE benchmark. Read More
TernaryBERT: Quantization Meets Distillation
A BERTology contribution by Huawei
The ongoing trend of building ever larger models like BERT and GPT-3 has been accompanied by a complementary effort to reduce their size at little or no cost in accuracy. Effective models are built either via distillation (Pre-trained Distillation, DistilBERT, MobileBERT, TinyBERT), quantization (Q-BERT, Q8BERT) or parameter pruning.
On September 27, Huawei introduced TernaryBERT, a model that leverages both distillation and quantization to achieve accuracy comparable to the original BERT model with ~15x decrease in size. What is truly remarkable about TernaryBERT is that its weights are ternarized, i.e. have one of three values: -1, 0, or 1 (and can hence be stored in only two bits). Read More
Algorithms are not enough
The next breakthrough in AI requires a rethinking of our hardware
Today’s AI has a problem: it is expensive. Training Resnet-152, a modern computer vision model, is estimated to cost around 10 Billion floating point operations, which is dwarfed by modern language models. Training GPT-3, the recent natural language model from OpenAI, is estimated to cost 300 Billion Trillion floating point operations, which costs at least $5M on commercial GPUs. Compare this to the human brain, which can recognize faces, answer questions, and drive cars with as little as a banana and a cup of coffee. Read More
Google Teases Large Scale Reinforcement Learning Infrastructure
“The new infrastructure reduces the training time from eight hours down to merely one hour compared to a strong baseline.”
The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota 2 learns from batches of 2 million frames every 2 seconds. The infrastructure that handles RL at this scale should be not only good at collecting a large number of samples, but also be able to quickly iterate over these extensive amounts of samples during training. Read More