AI-generated audio that mimics humans can be so convincing that people can’t tell the difference a quarter of the time – even when they’re trained to identify faked voices, a new study claims.
Researchers at University College London investigated how accurately humans can differentiate between AI-generated audio and organic audio, according to a report in the science and medical journal Plos One. This study was conducted in response to the growing popularity of “deepfakes”, videos or pictures which can be altered so that they appear like real images.
“Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse,” researchers wrote in their paper published this month.
“However, studies investigating human detection capabilities are limited,” the researchers continued, explaining why they launched the endeavor to find just how realistic speech deepfakes are to human listeners.
The research team used a text-to-speech algorithm on two data sets that generated 50 deepfake speech samples. The researchers used both English and Mandarin speech “t o understand if listeners used language-specific attributes to detect deepfakes.”
The speech samples were then tested on 529 people who were asked if they believed a sample was an actual human speaking or if the speech was computer-generated.
Participants were only able to accurately identify deepfake speech 73% of the time, while results only improved “slightly” after participants were trained on how to recognize computer-generated audio, according to the study.
“Our findings confirm that humans are unable to reliably detect deepfake speech, whether or not they have received training to help them spot artificial content,” Kimberly Mai, an author of the study, said in a statement.
“It’s also worth noting that the samples that we used in this study were created with algorithms that are relatively old, which raises the question whether humans would be less able to detect deepfake speech created using the most sophisticated technology available now and in the future.”
The study is considered to be the first of its kind to investigate how humans detect deepfake audio in a language other than English.
English and Mandarin-speaking participants showed roughly the same rate of detection, with English-speakers citing they relied on listening to breathing to help determine if the audio was real or computer-generated. Mandarin speakers said that they payed attention to the speaker’s word pace and cadence to correctly identify audio.
“Although there are some differences in the features that English and Mandarin speakers use to detect deepfakes, the two groups share many similarities. Therefore, the threat potential of speech deepfakes is consistent despite the language involved,” the researchers wrote.
The study comes as a “warning” that “humans cannot reliably detect speech deepfakes,” with researchers highlighting that “adversaries are already using speech deepfakes to commit fraud,” and the tech will only become more convincing with the recent advancements in AI.
” With generative artificial intelligent technology becoming more sophisticated, and with many of these open-source tools available to the public, we are on the cusp of many benefits. In a university statement, Lewis D. Griffin, a professor of computer science at the University of London, said that it would be wise for organizations and governments to create strategies to combat abuse of such tools. However, we must also acknowledge the positive potentials on the horizon.
Audio deepfakes have already been used repeatedly across the U.S. and Europe to carry out crimes.
The study pointed to a scam in 2019, for example, that left a U.K.-based energy firm roughly $243,000 in the red after a fraudster hopped on the phone with the firm’s CEO and pretended to be the boss of the organization’s Germany-based parent company.
The scammer was able to use AI technology to capture the boss’ slight German accent and “melody” of the man’s voice while demanding the CEO immediately transfer money to a bank account, the Wall Street Journal reported at the time.
Stateside, victims are sounding the alarm on phone scams that often target elderly Americans. Last month, the Federal Trade Commission issued a warning that more scammers were using voice-cloning technologies to trick unsuspecting people into paying money. The criminals can take a soundbite or video of a person that’s posted online, clone the voice and call the person’s loved ones while pretending to be in a dire situation and in the need of fast money.
Many victims tell police the voice was so close to that of their loved ones that they did not immediately realize it was a fraud.
Mai said to Fox News Digital the research showed that teaching people how to detect AI-generated voice will not “improve detection abilities, so we need to focus on other methods.” She pointed out a few other ways to mitigate the risks of the technology.
“Crowdsourcing and aggregating responses as a fact-checking measure could be helpful for now. We also demonstrate even though humans are not reliable individually, detection performance increases when you aggregate responses (collect lots of decisions together and make a majority decision),” Mai explained.
“Efforts should also be focused on making automated detectors more resilient to different test audio. In addition, organizations should prioritize implementing other strategies like regulations and policies.”
The post Researchers warn ‘humans cannot reliably detect’ audio deepfakes even when trained appeared first on Fox News.