Daily Archives: May 21, 2021
High-performance speech recognition with no supervision at all
Whether it’s giving directions, answering questions, or carrying out requests, speech recognition makes life easier in countless ways. But today the technology is available for only a small fraction of the thousands of languages spoken around the globe. This is because high-quality systems need to be trained with large amounts of transcribed speech audio. This data simply isn’t available for every language, dialect, and speaking style. Transcribed recordings of English-language novels, for example, will do little to help machines learn to understand a Basque speaker ordering food off a menu or a Tagalog speaker giving a business presentation.
This is why we developed wav2vec Unsupervised (wav2vec-U), a way to build speech recognition systems that require no transcribed data at all. It rivals the performance of the best supervised models from only a few years ago, which were trained on nearly 1,000 hours of transcribed speech. We’ve tested wav2vec-U with languages such as Swahili and Tatar, which do not currently have high-quality speech recognition models available because they lack extensive collections of labeled training data.
Wav2vec-U is the result of years of Facebook AI’s work in speech recognition, self-supervised learning, and unsupervised machine translation. It is an important step toward building machines that can solve a wide range of tasks just by learning from their observations. We think this work will bring us closer to a world where speech technology is available for many more people. Read More
The race to understand the exhilarating, dangerous world of language AI
Hundreds of scientists around the world are working together to understand one of the most powerful emerging technologies before it’s too late.
On May 18, Google CEO Sundar Pichai announced an impressive new tool: an AI system called LaMDA that can chat to users about any subject.
To start, Google plans to integrate LaMDA into its main search portal, its voice assistant, and Workplace, its collection of cloud-based work software that includes Gmail, Docs, and Drive. But the eventual goal, said Pichai, is to create a conversational interface that allows people to retrieve any kind of information—text, visual, audio—across all Google’s products just by asking. Read More