Voice “deepfakes” are no longer rare and now constitute a real risk for citizens and businesses. Today, just scraping a few seconds of public audio is enough to clone a voice and sneak it into calls, messages or video calls that imitate family members, … employees or managers with disturbing precision. From there, the pattern usually repeats itself: an urgent request for money, a banking incident, or a notice that “can’t wait.” In this context, distinguishing what is authentic is increasingly difficult and any unexpected communication begins to be greeted with suspicion.
The distance between “deepfakes” and classic imitations is already enormous. José Antonio Marcos, vice dean of the Business & Tech Faculty of UAX, explains that audios generated with AI reproduce timbre, rhythm and even small emotional nuances, “which makes them particularly credible. With just 30 seconds of recording, he points out, “a solid clone can be built, then hundreds of messages generated in a short time.” This realism has enabled frauds like the one that led the British engineering company Arup to transfer more than $200 million, or a frustrated fraud case at Ferrari in which it usurped the voice of its CEO.
This ability to imitate a voice in such detail is not limited to recorded audio. A more worrying phenomenon has emerged in recent attacks: real-time conversations. Luis Corrons, cybersecurity expert at Gen, comments that AI “can modulate the scammer’s voice as the call progresses until it is almost indistinguishable from the person being spoofed.” The deception is enhanced when fake sounds are combined with artificially generated video capable of reproducing gestures and expressions. For Corrons, this mixture of immediacy and natural appearance makes these montages very convincing and easy to reproduce.
The naturalness of these montages depends largely on the way in which the voice that supports them is generated. Miguel López, director of EMEA South at Barracuda Networks, explains that the process has become so accessible that “many recordings are no longer necessary to obtain a reliable model.” Short fragments taken from a WhatsApp message or a video on social media “provide enough material for the AI to reconstruct a convincing sound profile”. With this minimal sample, he indicates, the system produces the voice in just a few minutes and you can use it even during real calls.
The deception is enhanced when fake sounds are combined with artificially generated video.
hard impact
In a business, the damage caused by cloned voice fraud goes far beyond the loss of transfer. Paula Yanes, vice president and head of digital trust at Capgemini Invent, points out that “these incidents result in internal investigations, legal fees and claims that add to the final bill.” Even without public exposure, they erode customer and supplier trust. The internal impact also weighs: operational urgency “is replaced by constant checks, which slows down routine tasks and forces us to work with more rigid structures than usual,” he explains.
For Rafael Palacios, director of the AI Office of Pontificia Comillas University and professor at Comillas ICAI, behind these frauds “there is almost never improvisation.” Although an individual may attempt a single hoax, “successful cases come from organizations that have previously gathered detailed information about the victim to make their message more convincing,” says the expert. This preliminary phase, he adds, may include small contacts intended only to obtain clues about the person, the company or even the bank before the main attack. In his experience, the motivation “is almost always economic, which leads to the increasing professionalization of these groups.”
In corporate environments, these frauds follow a pattern that, despite the realism of cloned voices, is recognizable to those who know where to look. Miguel Ángel Thomas, head of cybersecurity at NTT DATA, explains that attackers “usually create artificial urgency for the action to ‘be done now’ and avoid verification, often from unusual channels, such as unknown numbers, WhatsApp audios or video calls without a camera.” These communications “rely on a rigid, emotionally flat script, with unnatural pauses, micro-latencies, or inconsistent responses when the conversation goes off-script,” says Thomas.
The precision of these imitations makes it increasingly difficult to distinguish them by ear. Eduardo Prieto, general manager of Visa in Spain, comments that “a fake voice can sound perfectly natural even in situations that seem routine. Still, he says, small signs usually appear, “like a beat that doesn’t quite match or a slight lag when the fraud includes video.”
Within reach
However, the enormous amount of false documents in circulation makes it increasingly difficult to differentiate from reality, explains Gastón Fornés, professor at EAE Business School. One of the challenges is that artificial intelligence has become so cheap that anyone can generate manipulated audio or video files with automatic tools,” he explains.
Recent cases show that these deceptions do not only affect large companies. Gen’s Corrons mentions that “individuals continue to be a common target, from virtual kidnappers who imitate a loved one’s voice to supposed bank agents who request passwords or urgent transfers using audio posted on the networks.”
In the business world, Thomas, from NTT DATA, recalls that the areas of finance, treasury or purchasing are generally on the front line, especially when communication arrives through unusual channels. López, of Barracuda Networks, adds that “some of the most sophisticated scams have succeeded in causing multimillion-dollar transfers with fake orders sent in the name of administrators.”
To the naked eye, a deepfake voice can appear flawless, which is why it’s important to pay attention to the small details. UAX’s Marcos points out, as Visa’s Prieto has already pointed out, that AI is capable of reproducing timbre and rhythm with enormous precision, but rarely achieves the spontaneity of a real voice.
Short fragments of a WhatsApp message or social media video are used to edit
Protecting yourself involves adjusting your habits and internal processes. Capgemini Invent’s Yanes insists on adding “additional verification to any sensitive request and teaching teams to brake before reacting.” López, of Barracuda Networks, suggests reinforcing these thoughts with multi-factor authentication and simulations reproducing voice attacks. And Prieto recommends “stopping for a few seconds and confirming the information through another channel.” This remains a simple, but decisive, way to avoid fraud.
Abnormal patterns
At the same time, artificial intelligence also allows companies to improve their defenses. From Visa, Prieto says it is already used to detect anomalous patterns in payments and internal access, identifying operations that do not correspond to the user’s usual behavior and blocking them before the fraud is completed.
These systems analyze thousands of signals in real time and can stop attempts that, to the naked eye, would be virtually undetectable. However, experts agree that there is no foolproof protection.
The advance of “deepfakes” and the democratization of tools force us to live in a more uncertain environment, in which verification becomes an essential routine. Technology helps, but it does not replace caution. Stopping, reviewing, and confirming will remain, for some time to come, the most effective way to avoid falling for a deception that seems all too real.