Rick's Cafe AI 9:22 am on January 19, 2024
Tags: Audio, NLP ( 486 )

OpenVoice: Versatile Instant Voice Cloning

We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell. – Read More

#nlp, #audio

Rick's Cafe AI 9:39 am on November 6, 2023
Tags: Audio

SALMONN, the First Model that Hears like Humans do

People often underestimate the importance of hearing to function correctly in our world and, more importantly, as an essential tool for learning.

As the famed Helen Keller once said, “Blindness cuts us off from things, but deafness cuts us off from people” and let’s not forget that this woman was blind and deaf.

Therefore, it’s only natural to see hearing as an indispensable requirement for AI to become the sought-after superior ‘being’ that some people predict it will become.

Sadly, current AI systems suck at hearing.

… Now, a new model created by the company behind TikTok, ByteDance, challenges this vision.

SALMONN is the first-ever multimodal audio-language AI system for generic hearing, a model that can process random audio signals from the three main sound types: speech, audio events, and music. — Read More

Read the Paper

#audio

Rick's Cafe AI 8:18 am on October 27, 2023
Tags: Audio

The Beatles: ‘final’ song Now and Then to be released thanks to AI technology

Now and Then, the long-awaited “final” Beatles song featuring all four members, is to be released next week thanks to the same AI technology that was used to enhance the audio on Peter Jackson’s documentary Get Back.

“There it was, John’s voice, crystal clear,” Paul McCartney said in a statement. “It’s quite emotional. And we all play on it, it’s a genuine Beatles recording. In 2023, to still be working on Beatles music, and about to release a new song the public haven’t heard, I think it’s an exciting thing.” — Read More

Video

#audio

Rick's Cafe AI 2:08 pm on October 25, 2023
Tags: Audio, Videos ( 367 )

The REAL Fight Over AI Music – Ft. CEO of Spotify and Grimes

Read More

#audio, #videos

Rick's Cafe AI 9:43 am on September 15, 2023
Tags: Audio

Stability AI, gunning for a hit, launches an AI-powered music generator

… Today marks the release of Stable Audio, a tool that Stability claims is the first capable of creating “high-quality,” 44.1 kHz music for commercial use via a technique called latent diffusion. Trained on audio metadata as well as audio files’ durations — and start times — Stability says that Audio Diffusion’s underlying, roughly 1.2-billion-parameter model affords greater control over the content and length of synthesized audio than the generative music tools released before it. — Read More

#audio

Rick's Cafe AI 1:37 pm on September 6, 2023
Tags: Audio

AI-Generated Masterpiece: 21 Savage x Travis Scott – Whiplash by @ghostwriter

Read More

#audio

Rick's Cafe AI 11:41 am on August 31, 2023
Tags: Audio

Redub Me — Speak to the world!

Dub your content into 70+ languages at a click of a button, and reach millions of new fans. — Read More

#audio

Rick's Cafe AI 9:02 am on August 22, 2023
Tags: Audio

Developers are now using AI for text-to-music apps

With the rise in popularity of Large Language Models (LLMs) and generative AI tools like ChatGPT, developers have found use cases to mold text in different ways for use cases ranging from writing emails to summarizing articles. Now, they are looking to help you generate bits of music by just typing some words.

Brett Bauman, the developer of PlayListAI (previously LinupSupply), launched a new app called Songburst on the App Store this week. The app doesn’t have a steep learning curve. You just have to type in a prompt like “Calming piano music to listen to while studying” or “Funky beats for a podcast intro” to let the app generate a music clip. — Read More

#audio

Rick's Cafe AI 8:56 am on August 14, 2023
Tags: Audio

AudioSep — Separate Anything You Describe

Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instruments, limited classes of audio events), are unable to separate audio concepts in the open domain. In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries. We train AudioSep on large-scale multimodal datasets and extensively evaluate its capabilities on numerous tasks including audio event separation, musical instrument separation, and speech enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, substantially outperforming previous audio-queried and language-queried sound separation models. For reproducibility of this work, we will release the source code, evaluation benchmark and pre-trained model at: this https URL. — Read More

#audio

Rick's Cafe AI 8:49 am on August 8, 2023
Tags: Audio, Surveillance ( 147 )

New acoustic attack steals data from keystrokes with 95% accuracy

A team of researchers from British universities has trained a deep learning model that can steal data from keyboard keystrokes recorded using a microphone with an accuracy of 95%.

When Zoom was used for training the sound classification algorithm, the prediction accuracy dropped to 93%, which is still dangerously high, and a record for that medium.

Such an attack severely affects the target’s data security, as it could leak people’s passwords, discussions, messages, or other sensitive information to malicious third parties. — Read More

#audio, #surveillance

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: Audio

OpenVoice: Versatile Instant Voice Cloning

SALMONN, the First Model that Hears like Humans do

The Beatles: ‘final’ song Now and Then to be released thanks to AI technology

The REAL Fight Over AI Music – Ft. CEO of Spotify and Grimes

Stability AI, gunning for a hit, launches an AI-powered music generator

AI-Generated Masterpiece: 21 Savage x Travis Scott – Whiplash by @ghostwriter

Redub Me — Speak to the world!

Developers are now using AI for text-to-music apps

AudioSep — Separate Anything You Describe

New acoustic attack steals data from keystrokes with 95% accuracy