Ever wish you could listen to your favorite artist, actor or content creator speak in your native language? ElevenLabs is working on making that happen.
It’s an AI-powered speech synthesis company, co founded in 2022 by an ex-Googler. Since launch, ElevenLabs has become one of the best and most popular AI text-to-speech generators.
Offering both free and paid features, the software allows users to generate natural-sounding speech, design custom AI voices, and clone their voices.
After coming out of beta in August, the tool can identify text and produce speech in over 25 languages, after updating their deep learning model.
What’s this latest product update?
ElevenLabs now has a new dubbing feature that allows users to reproduce audio from one language in 28 others, while maintaining the speaker’s voice and speech patterns.
“The release of AI Dubbing is our biggest step yet towards eliminating these linguistic barriers of content," said CEO Mati Staniszewski. "It will help audiences enjoy any content they want, regardless of the language they speak."
Users can either upload the audio file of the voice they’d like to dub or share a link to it on social media. One of the concerns here is the risk of misuse, as anyone can reproduce anyone’s voice.
This tool could exacerbate the issues we’re seeing with deep fakes and ElevenLabs has been testing out solutions since its launch, including limiting product access to paid users and investing in deepfake detection systems.
Crazy weekend - thank you to everyone for trying out our Beta platform. While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases. We want to reach out to Twitter community for thoughts and feedback!— ElevenLabs (@elevenlabsio) January 30, 2023
Spotify recently announced a similar pilot, but for podcasts. They partnered with OpenAI to translate podcasts into other languages. It’s not yet available to all podcasters but we could see it rolled out in coming months.
For brands, voice dubbing has a huge appeal: Localization becomes faster, easier, and cheaper.
How good is the voice dubbing?
As a native French speaker, I wanted to test ElevenLabs’ new feature for myself and see how good it was at:
- Translating the content accurately.
- Preserving my voice.
- Making the translation sound natural and not so robot-ey.
They achieved nearly two of the three points and the results were impressive – but far from perfect.
First, I was wowed by how quickly the new version was created. In a few minutes, I could listen to myself speak in an entirely new language.
The software did well at translating what I said, from both English to French as well as French to English. And it did a pretty good job at voice cloning.
Where the tool really fell short is making it sound natural. In the clip below, my speech pattern, particularly my inflection, is off - to say the least.
Also, where did that accent come from? In my tests, that wasn’t consistent. While you can select a language, you can't narrow down the region you're from, which can lead to a mishmash of accents.
In one test not feature here, my AI-generated voice first sounded like I was from Quebec, then halfway through, I sounded Parisian.
When the software struggles to understand words or sounds, it will either repeat the word in its original language or go for total gibberish and hope no one notices.
Here's an example where I sourced the audio from this HubSpot YouTube video and dubbed Jamal's voice to French.
Even if you don't speak French, you'll notice around the 11-second mark that the voice changes, there's an overlap with another voice, and at the end, it's just gibberish with a French accent.
That could be because it wasn't a clean audio, there was music and sound effects in the background of the video. This likely made it harder for the tool to pick up on certain words.
Though there’s a lot of room for improvement, ElevenLabs' new dubbing tool is pretty cool. They've got the accuracy down for the most part when translating, now it's a matter of improving the voice.
Why does it matter?
Well, the days of watching poorly dubbed TV shows and movies could be over. A big plus not to be overlooked.
From a business perspective, this new era of voice cloning and dubbing could shift how companies market themselves and how they approach localization. When language is no longer a barrier, the possibilities open up dramatically.
For businesses with limited resources, a tool like this can help reach markets that were once inaccessible.