Image created using DALL·E 3 with this prompt: Create an image that shows AI (portrayed as a robot/synthetic human) sitting at a computer, acting as a sound editor, adding sound effects to a video project. Aspect ratio: 16×9.
ElevenLabs, a startup known for its AI voice cloning and text-to-speech software, is set to launch a new AI model that can generate realistic sound effects for videos based on text prompts. The company demonstrated this technology by adding background sounds to video clips created by OpenAI’s Sora model (which, if you haven’t seen yet, stop reading and click here now), showcasing the continued advancement of AI in creating immersive multimedia experiences.
The new AI model is expected to be able to create a variety of sounds – including footsteps, waves, and ambience – to accompany silent video footage. While the public release date has not been announced, interested individuals can sign up to be notified about the launch.
As you can imagine, generative AI sound effects will impact audio and video production, gaming, and all forms of extended reality, where realistic sound effects play a crucial role in enhancing the overall user experience. The idea of data-driven hyper-personalized sound effects, alarms, and notifications has been talked about for years. Now it’s just months away. This is super exciting!
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.