And 8 key features that make it the best ASR model (hey Siri, this one’s for you)
The following is a selection from The Algorithmic Bridge, an educational newsletter whose purpose is to bridge the gap between algorithms and people. It will help you understand the impact AI has in your life and develop the tools to better navigate the future.
Today I’m covering a subfield of AI I’d never thought I’d be writing about — mostly because it’s much more mature than the ones I usually write about (Large language models, AI art) and no breakthroughs were in sight. But I was mistaken. I’m referring to automatic speech recognition (ASR), as you surely have inferred from the headline… Bear with me because there are good reasons to read this news.
OpenAI announced Whisper a couple of days ago and people are already going crazy. Not because it’s a new concept or because of improvements in algorithm design. No, the reason is simpler: Whisper works better than any other commercial ASR system. Alexa, Siri, Google Assistant (these are the ones you’re probably familiar with), any of them will feel like last-century tech after you try Whisper. And you can. OpenAI, the company that tends to not do justice to its name, decided to open source this model. The digital experience is going to radically change for many people. Read More
Daily Archives: October 6, 2022
Generative Spoken Dialogue Language Modeling
We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. It is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces naturalistic turn taking. Generation samples can be found at: https://speechbot.github.io/dgslm. Read More
#nlpCryogeomorphic Characterization of Shadowed Regions in the Artemis Exploration Zone
The Artemis program will send crew to explore the south polar region of the Moon, preceded by and integrated with robotic missions. One of the main scientific goals of future exploration is the characterization of polar volatiles, which are concentrated in and near regions of permanent shadow. The meter-scale cryogeomorphology of shadowed regions remains unknown, posing a potential risk to missions that plan to traverse or land in them. Here, we deploy a physics-based, deep learning-driven post-processing tool to produce high-signal and high-resolution Lunar Reconnaissance Orbiter Narrow Angle Camera images of 44 shadowed regions larger than ∼40 m across in the Artemis exploration zone around potential landing sites 001 and 004. We use these images to map previously unknown, shadowed meter-scale (cryo)geomorphic features, assign relative shadowed region ages, and recommend promising sites for future exploration. We freely release our data and a detailed catalog of all shadowed regions studied. Read More
Robots are making French fries faster, better than humans
Fast-food French fries and onion rings are going high-tech, thanks to a company in Southern California.
Miso Robotics Inc in Pasadena has started rolling out its Flippy 2 robot, which automates the process of deep frying potatoes, onions and other foods.
A big robotic arm like those in auto plants – directed by cameras and artificial intelligence – takes frozen French fries and other foods out of a freezer, dips them into hot oil, then deposits the ready-to-serve product into a tray. Read More
Google’s newest AI generator creates HD video from text prompts
Not to be outdone by Meta, Google’s AI generator can output 1280×768 HD video at 24 fps.
Today, Google announced the development of Imagen Video, a text-to-video AI mode capable of producing 1280×768 videos at 24 frames per second from a written prompt. Currently, it’s in a research phase, but its appearance five months after Google Imagen points to the rapid development of video synthesis models.
Only six months after the launch of OpenAI’s DALLE-2 text-to-image generator, progress in the field of AI diffusion models has been heating up rapidly. Google’s Imagen Video announcement comes less than a week after Meta unveiled its text-to-video AI tool, Make-A-Video.
According to Google’s research paper, Imagen Video includes several notable stylistic abilities, such as generating videos based on the work of famous painters (the paintings of Vincent van Gogh, for example), generating 3D rotating objects while preserving object structure, and rendering text in a variety of animation styles. Google is hopeful that general-purpose video synthesis models can “significantly decrease the difficulty of high-quality content generation.” Read More
Blueprint for an AI Bill of Rights
MAKING AUTOMATED SYSTEMS WORK FOR THE AMERICAN PEOPLE
Among the great challenges posed to democracy today is the use of technology, data, and automated systems in ways that threaten the rights of the American public. Too often, these tools are used to limit our opportunities and prevent our access to critical resources or services. These problems are well documented. In America and around the world, systems supposed to help with patient care have proven unsafe, ineffective, or biased. Algorithms used in hiring and credit decisions have been found to reflect and reproduce existing unwanted inequities or embed new harmful bias and discrimination. Unchecked social media data collection has been used to threaten people’s opportunities, undermine their privacy, or pervasively track their activity—often without their knowledge or consent.
These outcomes are deeply harmful—but they are not inevitable. Automated systems have brought about extraordinary benefits, from technology that helps farmers grow food more efficiently and computers that predict storm paths, to algorithms that can identify diseases in patients. These tools now drive important decisions across sectors, while data is helping to revolutionize global industries. Fueled by the power of American innovation, these tools hold the potential to redefine every part of our society and make life better for everyone.
This important progress must not come at the price of civil rights or democratic values. Read More