It may look like Ruidong Zhang is talking to himself, but in fact the doctoral student in the field of information science is silently mouthing the passcode to unlock his nearby smartphone and play the next song in his playlist.
It’s not telepathy: It’s the seemingly ordinary, off-the-shelf eyeglasses he’s wearing, called EchoSpeech – a silent-speech recognition interface that uses acoustic-sensing and artificial intelligence to continuously recognize up to 31 unvocalized commands, based on lip and mouth movements. Read More
Tag Archives: Audio
Google’s New AI: DALL-E 2, But For Music!
It’s Game Over on Vocal Deepfakes
You may recall back in October I linked to an AI-generated simulated interview between Joe Rogan and Steve Jobs. I wrote:
I also don’t buy their claim that these voices are completely generated. Most of Jobs’s lines have auditorium echo — they sound like clips copy-and-pasted. If they can really generate these voices, why doesn’t their virtual Rogan actually say Steve Jobs’s name? Send me a clip of virtual Steve Jobs saying “John Gruber is a bozo, and I tell people not to waste their time reading Daring Fireball.” Then I’ll believe it.
I neglected to follow up until now, but Ignaz Kowalczuk from ElevenLabs (the company behind Prime Voice AI) took me up on the challenge and sent me this clip:
That clip sounds noticeably stilted, but it does sound like Steve Jobs.
Now come this: a Twitter thread from John Meyer, who trained a clone of Jobs’s voice and then hooked it up to ChatGPT to generate the words. The clips he posted to Twitter are freakishly uncanny. Read More
How I Broke Into a Bank Account With an AI-Generated Voice
Banks in the U.S. and Europe tout voice ID as a secure way to log into your account. I proved it’s possible to trick such systems with free or cheap AI-generated voices.
The bank thought it was talking to me; the AI-generated voice certainly sounded the same.
On Wednesday, I phoned my bank’s automated service line. To start, the bank asked me to say in my own words why I was calling. Rather than speak out loud, I clicked a file on my nearby laptop to play a sound clip: “check my balance,” my voice said. But this wasn’t actually my voice. It was a synthetic clone I had made using readily available artificial intelligence technology.
“Okay,” the bank replied. It then asked me to enter or say my date of birth as the first piece of authentication. After typing that in, the bank said “please say, ‘my voice is my password.’”
Again, I played a sound file from my computer. “My voice is my password,” the voice said. The bank’s security system spent a few seconds authenticating the voice.
“Thank you,” the bank said. I was in. Read More
‘Disrespectful to the Craft:’ Actors Say They’re Being Asked to Sign Away Their Voice to AI
Motherboard spoke to multiple voice actors and advocacy organizations, some of which said contracts including language around synthetic voices are now very prevalent.
Voice actors are increasingly being asked to sign rights to their voices away so clients can use artificial intelligence to generate synthetic versions that could eventually replace them, and sometimes without additional compensation, according to advocacy organizations and actors who spoke to Motherboard. Those contractual obligations are just one of the many concerns actors have about the rise of voice-generating artificial intelligence, which they say threaten to push entire segments of the industry out of work.
The news highlights the impact of the burgeoning industry of artificial intelligence-generated voices and the much lower barrier of entry for anyone to synthesize the voices of others. Read More
Whispers of A.I.’s Modular Future
ChatGPT is in the spotlight, but it’s Whisper—OpenAI’s open-source speech-transcription program—that shows us where machine learning is going.
One day in late December, I downloaded a program called Whisper.cpp onto my laptop, hoping to use it to transcribe an interview I’d done. I fed it an audio file and, every few seconds, it produced one or two lines of eerily accurate transcript, writing down exactly what had been said with a precision I’d never seen before. As the lines piled up, I could feel my computer getting hotter. This was one of the few times in recent memory that my laptop had actually computed something complicated—mostly I just use it to browse the Web, watch TV, and write. Now it was running cutting-edge A.I.
Despite being one of the more sophisticated programs ever to run on my laptop, Whisper.cpp is also one of the simplest. If you showed its source code to A.I. researchers from the early days of speech recognition, they might laugh in disbelief, or cry—it would be like revealing to a nuclear physicist that the process for achieving cold fusion can be written on a napkin. Whisper.cpp is intelligence distilled. It’s rare for modern software in that it has virtually no dependencies—in other words, it works without the help of other programs. Instead, it is ten thousand lines of stand-alone code, most of which does little more than fairly complicated arithmetic. It was written in five days by Georgi Gerganov, a Bulgarian programmer who, by his own admission, knows next to nothing about speech recognition. Gerganov adapted it from a program called Whisper, released in September by OpenAI, the same organization behind ChatGPT and dall-e. Whisper transcribes speech in more than ninety languages. In some of them, the software is capable of superhuman performance—that is, it can actually parse what somebody’s saying better than a human can.
What’s so unusual about Whisper is that OpenAI open-sourced it, releasing not just the code but a detailed description of its architecture. They also included the all-important “model weights”: a giant file of numbers specifying the synaptic strength of every connection in the software’s neural network. In so doing, OpenAI made it possible for anyone, including an amateur like Gerganov, to modify the program. Gerganov converted Whisper to C++, a widely supported programming language, to make it easier to download and run on practically any device. This sounds like a logistical detail, but it’s actually the mark of a wider sea change. Until recently, world-beating A.I.s like Whisper were the exclusive province of the big tech firms that developed them. Read More
Researchers fear Microsoft’s ‘dangerous’ new AI voice technology
According to ArsTechnica, Microsoft has developed an AI system that is capable of using machine learning to accurately mimic the voice of anyone, complete with novel, generated sentences, based on just three seconds of audio input.
… According to the report, Microsoft engineers know this technology could be dangerous in the wrong hands, being used to create malicious “deepfakes.” A system that convincingly fakes people’s voices could do everything from discrediting celebrities or politicians with fake racist quotes, to discrediting a former spouse in a custody dispute. It could even be used to create virtual pornography of a person without their consent, or be used in wire fraud by impersonating a CEO to trick companies into transferring their money. Read More
Microsoft’s VALL-E can imitate any voice with just a three-second sample
Artificial intelligence can replicate any voice, including the emotions and tone of a speaker.
- Microsoft recently released an AI tool called VALL-E that can create convincing replications of people’s voices.
- The tool uses just a 3-second recording as a prompt to generate content.
- VALL-E can replicate the emotions of a speaker, differentiating it from several AI models.
Why AI audiobook narrators could win over some authors and readers, despite the vocal bumps
Apple and Google’s AI turn in a booming market may sound less than human and raise the ire of voiceover actors, but it has cost benefits
For the first few seconds, the narrator of Kristen Ethridge’s new romance audiobook, Shelter from the Storm, sounds like a human being. The voice is light and carefully enunciated, with the slow pacing of any audiobook narrator, as it begins: “There’s a storm coming, and her name is Hope.”
Then, something about the pacing of the words grates on the ear. It’s a little too regular, even robotic. “I know that sounds a little crazy,” the breathy voice continues, grinding out the words. “That something so destructive could be labeled with such a peaceful name.” Read More
AI-generated podcast features fake voices of Steve Jobs and Joe Rogan
The creators of podcast.ai have released a 20-minute podcast featuring artificially-generated versions of Steve Jobs and Joe Rogan. The entire interview was created using AI, with the clone of Jobs discussing Eastern mysticism, Buddhism, LSD, Google, Microsoft Windows 3, and more Read More
#audio, #fake, #podcasts