We were blown away by the Sora announcement but felt it needed something… What if you could describe a sound and generate it with AI? — Read More
Tag Archives: Audio
TikTok can generate AI songs, but it probably shouldn’t
TikTok has launched many songs that have gone viral over the years, but now it’s testing a feature that lets more people exercise their songwriting skills… with some help from AI.
AI Song generates songs from text prompts with help from the large language model Bloom. Users can write out lyrics on the text field when making a post. TikTok will then recommend AI Song to add sounds to the post, and they can toggle the song’s genre. – Read More
OpenVoice: Versatile Instant Voice Cloning
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell. – Read More
#nlp, #audioSALMONN, the First Model that Hears like Humans do
People often underestimate the importance of hearing to function correctly in our world and, more importantly, as an essential tool for learning.
As the famed Helen Keller once said, “Blindness cuts us off from things, but deafness cuts us off from people” and let’s not forget that this woman was blind and deaf.
Therefore, it’s only natural to see hearing as an indispensable requirement for AI to become the sought-after superior ‘being’ that some people predict it will become.
Sadly, current AI systems suck at hearing.
… Now, a new model created by the company behind TikTok, ByteDance, challenges this vision.
SALMONN is the first-ever multimodal audio-language AI system for generic hearing, a model that can process random audio signals from the three main sound types: speech, audio events, and music. — Read More
Read the Paper
The Beatles: ‘final’ song Now and Then to be released thanks to AI technology
Now and Then, the long-awaited “final” Beatles song featuring all four members, is to be released next week thanks to the same AI technology that was used to enhance the audio on Peter Jackson’s documentary Get Back.
“There it was, John’s voice, crystal clear,” Paul McCartney said in a statement. “It’s quite emotional. And we all play on it, it’s a genuine Beatles recording. In 2023, to still be working on Beatles music, and about to release a new song the public haven’t heard, I think it’s an exciting thing.” — Read More
Video
The REAL Fight Over AI Music – Ft. CEO of Spotify and Grimes
Stability AI, gunning for a hit, launches an AI-powered music generator
… Today marks the release of Stable Audio, a tool that Stability claims is the first capable of creating “high-quality,” 44.1 kHz music for commercial use via a technique called latent diffusion. Trained on audio metadata as well as audio files’ durations — and start times — Stability says that Audio Diffusion’s underlying, roughly 1.2-billion-parameter model affords greater control over the content and length of synthesized audio than the generative music tools released before it. — Read More
AI-Generated Masterpiece: 21 Savage x Travis Scott – Whiplash by @ghostwriter
Redub Me — Speak to the world!
Dub your content into 70+ languages at a click of a button, and reach millions of new fans. — Read More
Developers are now using AI for text-to-music apps
With the rise in popularity of Large Language Models (LLMs) and generative AI tools like ChatGPT, developers have found use cases to mold text in different ways for use cases ranging from writing emails to summarizing articles. Now, they are looking to help you generate bits of music by just typing some words.
Brett Bauman, the developer of PlayListAI (previously LinupSupply), launched a new app called Songburst on the App Store this week. The app doesn’t have a steep learning curve. You just have to type in a prompt like “Calming piano music to listen to while studying” or “Funky beats for a podcast intro” to let the app generate a music clip. — Read More