How do you know the Donald Trump speech you heard on social media this morning was actually the former President? The truth is you don’t, because voice-cloning technology has grown so sophisticated and so readily available that it’s hard to know what is real and what is fake anymore.
Voice cloning is a huge potential threat to democracy and to recording artists, who are finding their voice being used on songs they didn’t record. But it’s also a technology that has genuine creative potential for providing new character voices in video games or movies, for example.
Here we’re going to explore what voice cloning is, its potential dangers, and the measures being taken to rein in the technology and use it for good.
How does voice cloning work?
Everyone’s voice has a unique character, but the tone, pitch, speech patterns and inflections that make up someone’s voice can be digitally mapped.
Here, for example, is the spectral frequency display of me repeating the phrase “I work for TechFinitive”, as captured by Adobe’s Audition software.
There are slight variations between the two samples, as nobody says the same sentence twice in exactly the same way, but you can also see a clear pattern. And even with a sample of voice as short as this, just a few seconds long, algorithms trained on millions of voice samples can create a virtual model of what my voice would sound like when saying practically anything.
How easy is it to clone a voice?
A doddle. Almost anyone could do it using one of the many AI services that have erupted onto the market over the past year.
Perhaps the best known is ElevenLabs, which caught the public’s attention by demonstrating the power of its voice cloning technology with the video below:
Anyone can now go to the ElevenLabs website, upload a sample of their own voice – or (with permission) someone else’s – and get a convincing replica that can speak any written text that you enter.
Below, for example, is an audio sample I created on ElevenLabs using one of the site’s sample voices:
Rival services such as Altered Studio let you upload an audio recording of your own voice and have those same words spoken – with exactly the same pronunciation, intonation and rhythm as the original recording – in someone else’s voice. It allows you to take words spoken by me as a 46-year-old male from the south of England and have them read in the voice of a 65-year-old woman from California, for example.
That has amazing creative potential, which we’ll come to shortly, but it also creates enormous dangers.
What are the dangers of voice cloning?
In a word: trust. Voice cloning makes it even harder to believe that what we’re seeing and hearing is genuinely the person involved.
In an era where it’s possible for a video on social media to reach millions of people within minutes, that’s extremely dangerous. If a faked video of the US President warning people of an imminent nuclear strike were to go viral, for example, it could have enormous ramifications – especially if it can be made to look as if it’s coming from a legitimate news source.
Likewise, a faked undercover audio clip of a politician confessing to a crime or a cover-up could undermine trust in that person and potentially impact election results. It would be naive in the extreme to believe this sort of tactic isn’t already being used.
The entertainment industry is also having its own struggle with voice clones. Earlier this year, a song that used AI to clone the voices of Drake and The Weekend was taken down after going viral on streaming services, following complaints from the artists’ record label.
The label involved – Universal Music Group – last month announced a partnership with YouTube to tackle the problem of copyright infringement with voice clones. It’s also exploring ways AI could be used to benefit artists.
Our security expert, Davey Winder, provided a guide to voice-cloning scams and how to avoid them in a separate article.
How could voice cloning be used for good?
For people creating video games, movies, adverts or other creative ventures, voice cloning could offer enormous benefits.
Instead of having to employ eight different actors to voice characters in a video game, for example, the studio might use just one or two and rely on cloning to create a variety of characters with distinct voices.
If you’re a small business that can’t afford an actor to narrate your YouTube videos, you could use voice cloning instead – choosing a voice that would resonate with your target market. For example, if you’re a 40-something man designing clothes for teenagers, you might want your promotional videos narrated by a 16-year-old girl. You might pick different accents for adverts in different countries, or even different regions in the same country. That’s entirely possible with voice cloning.
But before you rush off with a recording of Brad Pitt’s voice, cloning that to promote your new online store, just remember that cloning well-known personalities could land you with a very expensive legal bill.
Nathalie Parent, Chief People Officer at Shift Technology: “HR is the conscience of an organisation”
For more than 30 years, Nathalie Parent has led global HR teams, working primarily with software companies. Today she’s Chief People Officer at Shift Technology
Amazon introduces new storage class that makes it cheaper to store rarely used files
Robot carers are real, but caregiving has bigger problems, writes Richard Trenholm in this FlashForward edition