Open sourcing AudioCraft: Generative AI for audio made simple and available to all

Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or an indie game developer populating virtual worlds with realistic sound effects and ambient noise on a shoestring budget. Or a small business owner adding a soundtrack to their latest Instagram post with ease. That’s the promise of AudioCraft — our simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.

AudioCraft consists of three models: MusicGenAudioGen, and EnCodec. MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text-based user inputs, while AudioGen, which was trained on public sound effects, generates audio from text-based user inputs. Today, we’re excited to release an improved version of our EnCodec decoder, which allows for higher quality music generation with fewer artifacts; our pre-trained AudioGen model, which lets you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor; and all of the AudioCraft model weights and code. The models are available for research purposes and to further people’s understanding of the technology. We’re excited to give researchers and practitioners access so they can train their own models with their own datasets for the first time and help advance the state of the art. — Read More

#audio