Facebook today introduced Captum, a library for explaining decisions made by neural networks with deep learning framework PyTorch. Captum is designed to implement state of the art versions of AI models like Integrated Gradients, DeepLIFT, and Conductance. Captum allows researchers and developers to interpret decisions made in multimodal environments that combine, for example, text, images, and video, and allows them to compare results to existing models within the library. Read More
Monthly Archives: October 2019
Could blacklisting China's AI champions backfire?
Just over two years ago, China announced an audacious plan to overtake the US and lead the “world in AI [artificial intelligence] technology and applications by 2030”.
It is already widely regarded to have overtaken the EU in many aspects.
But now its plans may be knocked off course by the US restricting certain Chinese companies from buying technologies developed or manufactured in the States. Read More
An AI Pioneer Wants His Algorithms to Understand the 'Why'
In March, Yoshua Bengio received a share of the Turing Award, the highest accolade in computer science, for contributions to the development of deep learning—the technique that triggered a renaissance in artificial intelligence, leading to advances in self-driving cars, real-time speech translation, and facial recognition.
Now, Bengio says deep learning needs to be fixed. He believes it won’t realize its full potential, and won’t deliver a true AI revolution, until it can go beyond pattern recognition and learn more about cause and effect. In other words, he says, deep learning needs to start asking why things happen. Read More
Sunspring | A Sci-Fi Short Film Starring Thomas Middleditch
Knowing that an AI wrote Sunspring makes the movie more fun to watch, especially once you know how the cast and crew put it together. Director Oscar Sharp made the movie for Sci-Fi London, an annual film festival that includes the 48-Hour Film Challenge, where contestants are given a set of prompts (mostly props and lines) that have to appear in a movie they make over the next two days …. It even has its own musical interlude (performed by Andrew and Tiger), with a pop song Benjamin composed after learning from a corpus of 30,000 other pop songs. Read More
BEAN: Interpretable Representation Learning with Biologically-Enhanced Artificial Neuronal Assembly Regularization
Deep neural networks (DNNs) are known for extracting good representations from a large amount of data. However, the representations learned in DNNs are typically hard to interpret, especially the ones learned in dense layers. One crucial issue is that neurons within each layer of DNNs are conditionally independent with each other, which makes the co-training and analysis of neurons at higher modularity difficult. In contrast, the dependency patterns of biological neurons in the human brain are largely different from those of DNNs. Neuronal assembly describes such neuron dependencies that could be found among a group of biological neurons as having strong internal synaptic interactions, potentially high semantical correlations that are deemed to facilitate the memorization process. In this paper, we show such a crucial gap between DNNs and biological neural networks (BNNs)can be bridged by the newly proposed Biologically-Enhanced Artificial Neuronal assembly (BEAN) regularization that could enforce dependencies among neurons in dense layers of DNNs without altering the conventional architecture. Both qualitative and quantitative analyses show that BEAN enables the formations of interpretable and biologically plausible neuronal assemblies in dense layers and consequently enhances the modularity and interpretability of the hidden representations learned. Moreover, BEAN further results in sparse and structured connectivity and parameter sharing among neurons, which substantially improves the efficiency and generalizability of the model. Read More
Researcher Explains Deepfake Videos | WIRED
How “Cobots” Are Transforming Jobs in Every Industry, from Fast Food to Law
A recent estimation put 40% of the world’s jobs at risk of automation over the next 15 years. That’s a major shift, but it’s nothing new — throughout history, advances in technology have replaced human jobs time and again. Between 1947 and 2014, for example, the number of U.S. workers employed by the railroad industry dropped by 86% as a result of new technology and automation. At the same time, this tech dramatically increased productivity, allowing the amount of freight being moved to increase by 182%.
Today it’s the field of robotics — or rather, “cobotics” — that’s changing the way we work. ….
HBO Vice’s recent Special Report: The Future of Work took a closer look at what all this automation means for employees and companies alike. Read More
Making The Internet Of Things (IoT) More Intelligent With AI
According to IoT Analytics, there are over 17 Billion connected devices in the world as of 2018, with over 7 Billion of these “internet of things” (IoT) devices. The Internet of Things is the collection of those various sensors, devices, and other technologies that aren’t meant to directly interact with consumers, like phones or computers. Rather, IoT devices help provide information, control, and analytics to connect a world of hardware devices to each other and the greater internet. With the advent of cheap sensors and low cost connectivity, IoT devices are proliferating. Read More
Extreme Language Model Compression with Optimal Subwords and Shared Projections
Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model’s memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERTBASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques. Read More
TinyBERT: Distilling BERT For Natural Language Understanding
Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, the pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be well transferred to a small “student” TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT.
TinyBERT1 is empirically effective and achieves comparable results with BERT on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only ∼28% parameters and ∼31% inference time of them. Read More