Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instruments, limited classes of audio events), are unable to separate audio concepts in the open domain. In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries. We train AudioSep on large-scale multimodal datasets and extensively evaluate its capabilities on numerous tasks including audio event separation, musical instrument separation, and speech enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, substantially outperforming previous audio-queried and language-queried sound separation models. For reproducibility of this work, we will release the source code, evaluation benchmark and pre-trained model at: this https URL. — Read More
#audioTag Archives: Audio
New acoustic attack steals data from keystrokes with 95% accuracy
A team of researchers from British universities has trained a deep learning model that can steal data from keyboard keystrokes recorded using a microphone with an accuracy of 95%.
When Zoom was used for training the sound classification algorithm, the prediction accuracy dropped to 93%, which is still dangerously high, and a record for that medium.
Such an attack severely affects the target’s data security, as it could leak people’s passwords, discussions, messages, or other sensitive information to malicious third parties. — Read More
Open sourcing AudioCraft: Generative AI for audio made simple and available to all
Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or an indie game developer populating virtual worlds with realistic sound effects and ambient noise on a shoestring budget. Or a small business owner adding a soundtrack to their latest Instagram post with ease. That’s the promise of AudioCraft — our simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.
AudioCraft consists of three models: MusicGen, AudioGen, and EnCodec. MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text-based user inputs, while AudioGen, which was trained on public sound effects, generates audio from text-based user inputs. Today, we’re excited to release an improved version of our EnCodec decoder, which allows for higher quality music generation with fewer artifacts; our pre-trained AudioGen model, which lets you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor; and all of the AudioCraft model weights and code. The models are available for research purposes and to further people’s understanding of the technology. We’re excited to give researchers and practitioners access so they can train their own models with their own datasets for the first time and help advance the state of the art. — Read More
Best AI Music Generators for Your Next Composition in 2023
AI has successfully swamped many industries including healthcare, education, commerce, education, and others. It has impacted and revolutionized our way of running business and has impacted supply chain management. However, a lesser-known contribution of AI is the music industry.
It has started to play a significant role in the music industry, by analyzing large data sets AI can identify patterns and trends that are difficult to predict otherwise. Artificial intelligence can be integrated with music to improve the experience for both artists and listeners.
… In this article, you will find a comprehensive list of some of the best AI music generator software that can be used to improve the quality of music and streaming services. You can find all the peculiar features and benefits of using AI music generators. — Read More
Project S.A.T.U.R.D.A.Y — A Vocal Computing Toolbox
A toolbox for vocal computing built with Pion, whisper.cpp, and Coqui TTS. Build your own personal, self-hosted J.A.R.V.I.S powered by WebRTC
Project S.A.T.U.R.D.A.Y is a toolbox for vocal computing. It provides tools to build elegant vocal interfaces to modern LLMs. The goal of this project is to foster a community of like minded individuals who want to bring forth the technology we have been promised in sci-fi movies for decades. It aims to be highly modular and flexible while staying decoupled from specific AI Models. This allows for seamless upgrades when new AI technology is released. — Read More
Introducing: Voice Library
Today, we [Eleven Labs] are releasing our latest development at the intersection of research, product and community: the Voice Library.
Voice Library is a community space for generating, sharing, and exploring a virtually infinite range of voices. Leveraging our proprietary Voice Design tool, Voice Library brings together a global collection of vocal styles for countless applications. — Read More
Grammys CEO Breaks Down Rules Around AI Recordings: “This Is Something We Have to Pay Attention To”
As the world continues to grapple with the AI takeover, so are the Grammys.
The Recording Academy made headlines last week when it announced its rules about music created with artificial intelligence. Some feel like those songs should be banned, others say they are creative and innovative
The Grammys are listening to both sides — but don’t expect them to award a robot. — Read More
A.I. human-voice clones are coming for the Amazon, Apple, Google audiobook
Annual audiobook sales could reach over $30 billion within a decade, and time and cost of production suggest AI will play a bigger role in the future.
Google Play and Apple Books utilize AI-generated voices to some extent already, though there are high hurdles to recreating human voice pacing, intonation and emotion.
Voice actors say opportunities to clone their voices for speedier, cheaper production on some forms of audiobooks can’t be ignored. — Read More
Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance
Meta AI researchers have achieved a breakthrough in generative AI for speech. We’ve developed Voicebox, the first model that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance.
Like generative systems for images and text, Voicebox creates outputs in a vast variety of styles, and it can create outputs from scratch as well as modify a sample it’s given. But instead of creating a picture or a passage of text, Voicebox produces high-quality audio clips. The model can synthesize speech across six languages, as well as perform noise removal, content editing, style conversion, and diverse sample generation. — Read More
Try It
The Beatles will release a new and ‘final record’ this year, Paul McCartney says — with a little help from AI
It’s the news fans of the Fab Four thought they would never see: The Beatles will release a new song this year featuring vocals from John Lennon, with a little help from artificial intelligence, Paul McCartney said Tuesday.
Speaking to BBC Radio 4, the 80-year-old McCartney confirmed that the band — whose cultural influence may have been unmatched in the 20th century — will release “the final Beatles record” this year, having used cutting-edge technology to extract Lennon’s voice from an old demo recording.
“We just finished it up and it’ll be released this year,” he said. — Read More