AI Conference Recap: Google, Microsoft, Facebook, and Others at ICLR 2021

At the recent International Conference on Learning Representations (ICLR), research teams from several tech companies, including GoogleMicrosoftIBMFacebook, and Amazon, presented nearly 250 papers out of a total of 860 on a wide variety of AI topics related to deep learning.

The conference was held online in early May and featured a “round-the-clock” program of live talks and Q&A sessions, in addition to pre-recorded videos for all accepted papers. Each day of the four-day conference featured two Invited Talks from leading deep-learning researchers. Although most of the papers were from academia, many prominent tech companies were well represented by their AI researchers: Google contributed over 100 papers, including several winning Outstanding Paper awards, Microsoft 53, IBM 35, Facebook 23, Salesforce 7, and Amazon 4. Read More

#artificial-intelligence, #big7

Go to the Movies with the H2O.ai Grandmasters!

Read More

Apple’s Live Text is going to read all the text in all your photos with AI

Apple has announced a new feature called Live Text, which will digitize the text in all your photos. This unlocks a slew of handy functions, from turning handwritten notes into emails and messages to searching your camera roll for receipts or recipes you’ve photographed.

…Apple says the feature is enabled using “deep neural networks” and “on-device intelligence,” with the latter being the company’s preferred phrasing for machine learning. (It stresses Apple’s privacy-heavy approach to AI, which focuses on processing data on-device rather than sending it to the cloud. Read More

#big7, #nlp

Musicians Demand Spotify Not Develop Emotional Speech Recognition Patent

Rage Against the Machine’s Tom Morello, Kliph Scurlock of The Flaming Lips, and all of Harry and the Potters are among the nearly 200 signatures on an open letter to Spotify firmly asking the streaming service not to develop a patent granted earlier this year for tech that can identify emotions in people’s voices.

The patent granted in January describes how Spotify could use its voice recognition tech to infer how someone is feeling by the sound of their voice. The software would also attempt to determine other aspects of the user’s identity, including gender, age, and accent. The resulting profile would then be combined with location data shared with Spotify to generate a playlist of songs that the AI suggests might appeal to the user at that moment. It’s essentially a far more sophisticated and personalized version of the recommendation algorithm used by Spotify right now. Who the listener is, where they are, and how they are feeling would presumably produce a playlist with many more songs the user would like to listen to compared to relying on only their previous listening history. Read More

#surveillance

China’s gigantic multi-modal AI is no one-trick pony

Sporting 1.75 trillion parameters, Wu Dao 2.0 is roughly ten times the size of Open AI’s GPT-3.

When Open AI’s GPT-3 model made its debut in May of 2020, its performance was widely considered to be the literal state of the art. Capable of generating text indiscernible from human-crafted prose, GPT-3 set a new standard in deep learning. But oh what a difference a year makes. Researchers from the Beijing Academy of Artificial Intelligence announced on Tuesday the release of their own generative deep learning model, Wu Dao, a mammoth AI seemingly capable of doing everything GPT-3 can do, and more.

First off, Wu Dao is flat out enormous. It’s been trained on 1.75 trillion parameters (essentially, the model’s self-selected coefficients) which is a full ten times larger than the 175 billion GPT-3 was trained on and 150 billion parameters larger than Google’s Switch TransformersRead More

#nlp, #china-ai, #multi-modal

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code. Read More

#transfer-learning

Learning Transferable Visual Models From Natural Language Supervision

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision.We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the ac-curacy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28million training examples it was trained on. Read More

#image-recognition, #nlp

Towards General Purpose Vision Systems

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system’s ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting. Read More

#image-recognition

Brain Science 183: Jeff Hawkins shares his new theory of Intelligence

Read More
#human, #podcasts

Help a Computer Win the New Yorker Cartoon Caption Contest

This is a weekly experiment to see if an artificial intelligence program can produce real humor.

Why are we doing this?

TLDR: We want to determine whether non-funny humans (The Pudding team), when aided by a computer, can produce better-than-average jokes. Read More

#humor