Organization and team
CEO fraud in the age of voice cloning
A familiar voice on the phone no longer proves anything. What stops wire fraud is not a deepfake detector, but a procedure.
Last reviewed:
We learned to distrust suspicious emails. Nobody learned to distrust a voice.
Angle de lecture
The usual trap
The most widespread response to the threat of vocal deepfakesAI-generated synthetic media (image, video, voice) imitating a real person. comes down to two words: awareness and detection. Train teams to identify a synthetic voice. Deploy audio analysis tools. Learn to spot artefacts, micro-silences, unusual grain. This approach is appealing because it resembles what we did with phishingSocial engineering attack pushing targets to disclose credentials or execute code. — hunt for the weak signal, the spelling mistake, the strange address — and that partially worked there.
It does not work here. For two reasons that are not going to change.
The first: clones are already too good to be spotted by ear. The manager at a UK energy firm who wired €220,000 in 2019 had recognised his boss’s German accent, the melody of his voice, his turn of phrase. He had not heard an approximate clone. He had heard a convincing impersonation, complete with the regional accent and idiomatic expressions of the real CEO. A human trained in detection would not have done better.
The second: synthesis quality is improving faster than our ability to identify it. Training someone to spot today’s flaws does not prepare them for next year’s. Worse, that training creates false confidence — the conviction of having a tool that is not one.
The only defence that works does not rely on recognition. It relies on a procedural rule that makes the caller’s voice irrelevant, whether real or synthesised.
Three calls, three lessons
Documented cases of voice cloning allow us to situate the threat. Three of them are worth reading together, because they span three generations of the same attack.
2019: the voice alone is enough
In spring 2019, the head of a British subsidiary of an energy company received a call from his boss, the CEO of the German parent company. He recognised the accent, the melody of the voice, the phrasing. The CEO asked him to process an urgent wire transfer to a Hungarian supplier — a confidential acquisition in progress, €220,000, deadline: before end of day. The tone was as always, the urgency was plausible. The transfer went through.
The real CEO had never made that call. The voice had been synthesised from publicly available recordings. This is the first documented case of CEO fraudScam where an attacker impersonates an executive to order an urgent wire transfer. using voice cloning. Lesson: recognising a voice no longer proves anything.
2024: video with multiple participants removes the last doubts
In January 2024, an accountant at design firm Arup, in Hong Kong, received an email purportedly from the group’s CFO about an urgent, confidential transaction. He found the email suspicious — good instinct. The fraudsters anticipated this doubt and invited him to a video call. The CFO was there, recognisable. Several colleagues flanked him. All were convincing.
All were deepfakesAI-generated synthetic media (image, video, voice) imitating a real person..
The multi-person video call, precisely because it dispelled the suspicion the email had aroused, triggered fifteen separate transfers. Total: $25.6 million. The fraud was only discovered through reconciliation with headquarters, weeks later. Lesson: even a video call with several visible “colleagues” no longer constitutes identity verification. That is exactly the element that turned around an employee who had initially been suspicious.
2024: the question only the real executive could answer
In July 2024, a Ferrari executive received WhatsApp messages and then a voice call from the “CEO” Benedetto Vigna. The voice was convincing, the Southern Italian accent closely reproduced. The message concerned an urgent, confidential matter — the usual pattern.
The executive was doubtful. Not because he had detected an artefact in the voice. Because the request itself struck him as unusual. He asked a question that only the real Vigna could answer: the title of a book recently recommended during a private conversation. The synthetic voice hesitated, tried to deflect, then hung up.
That is the only defence that worked in all three cases. Not a deepfake detector. Not voice-recognition training. A pre-agreed out-of-band question, at zero cost, applied at the right moment. Lesson: what stops fraud is a procedure, not a tool.
Why detection is a dead end
Social engineeringHuman manipulation to obtain information or actions, bypassing technical defenses. augmented by AI has a structural advantage over detection: the attacker can iterate without limit, the target gets only one attempt. A voice-cloning service costs a few dozen pounds per month. The audio sample used to train the model comes from the target’s own interviews, podcasts, LinkedIn videos, and conference talks — data the target has published themselves. The marginal cost of a new attempt, after a failure, is near zero for the attacker.
For the target, the rules are reversed. They cannot call back twenty times to verify — they are under pressure, in a meeting, caught up in the manufactured urgency the attacker has constructed. They cannot pull out a detector when the phone rings. And even if they had one, the false positives and negatives of real-time analysis on a call degraded by phone compression would make the result unusable.
A familiar voice is no longer proof. It has become a marginal cost for the attacker.
This asymmetry is permanent. It will not be resolved by better detection tools, because synthesis tools are improving on the same curve, with more resources. The detection race is lost before it starts. That is not a reason to despair — it is a reason to change terrain.
What works: the protocol
The defence is not in the voice; it is in the procedure. Three rules, applied together, make voice cloning inoperable regardless of its quality.
The callback on a known channel. Any sensitive request received by phone or message triggers a callback on the number already on file for that contact — not the number that just called, not one provided in the message. This callback takes thirty seconds. It is non-negotiable, even if the caller seems impatient. This rule alone neutralises the great majority of attempts, because an attacker cannot answer on the real executive’s genuine phone number.
The pre-agreed verification question. For high-risk operations — significant transfers, sensitive access, irreversible decisions — a question that only the legitimate contact can answer, established during a previous conversation outside any potentially compromised channel. Not a password sent by email. A shared reference established in person or on a separate encrypted channel. Ferrari had no formal procedure: the executive improvised. A formal procedure ensures everyone applies the same rule, not just those who happen to have good instincts that day.
Mandatory dual authorisation above a threshold. Any transfer or access beyond a defined amount or sensitivity level requires two separate people, on two separate channels, regardless of who made the request and what urgency was cited. This rule applies even — especially — when the request appears to come from the CEO. The executive who publicly champions this procedure is the only one who makes it genuinely enforceable: without that explicit sponsorship, the finance team will bypass it to avoid bothering the boss.
What this means in practice
Angle de lecture
For you, personally
The same mechanism targets you, at your scale: your bank flagging suspicious activity and asking you to call back “as quickly as possible” on a number provided in the SMS, your child in difficulty calling from an unknown number, your usual supplier whose voice sounds slightly different but whose bank details have changed.
The response is identical in every case. Hang up. Call back on the official number you already have on file — the one on the back of your bank card, the one your child gave you in person. Never the number provided in the message that just arrived.
For family emergency calls, agree on a code word with your household. A simple, memorable word, chosen during an ordinary conversation. Not transmitted by text, not written down anywhere. That word guarantees nothing against an attacker who has already captured it, but it guarantees a great deal against an attempt that has not. And agreeing on it with your family takes five minutes. You can also report attempts to Action Fraud (UK: actionfraud.police.uk) or the FBI Internet Crime Complaint Center (US: ic3.gov).
For you, CISO / CTO / leadership
The problem of AI-augmented voice fraud is not a detection problem — your team cannot analyse every call in real time, and even if it could, the result would not be actionable within the available time window. It is a procedure and governance problem.
1. The dual-authorisation threshold is your first line. Set it in writing, have it approved by the executive, and make sure the finance team understands it applies even when the request appears to come from the CEO. Without that explicit sponsorship, the procedure collapses at the first call from the boss.
2. The callback channel is your second line. Any number used for a sensitive request must be verified against an internal directory, not against the number that just called. A process for updating supplier payment details must include systematic out-of-band verification — that is where the great majority of supplier wire fraud enters.
3. Finance team training focuses on procedure, not detection. “How to recognise a deepfake” is a question without a good answer. “What is the procedure when someone requests an urgent wire transfer and asks me to keep it confidential” is a question with a clear, correct answer. Training is for the second one.
For you, as the executive
You are the bait. Not by accident. Because you have spent years building your visibility: interviews, podcasts, conference talks, LinkedIn videos. That material, essential for business, is also the training corpus the attacker uses to clone your voice. You cannot stop being visible. But you can make that visibility inoperable as a means of authorisation.
The key decision is not technical. It is organisational, and only you can make it: no wire transfer, no sensitive access, no irreversible decision is authorised on the strength of a phone call or video alone, regardless of who appears to be asking — including you. This rule must be stated by you, publicly, to your finance team. Not in a policy document nobody reads. In a meeting, saying explicitly: “If someone calls claiming to be me and requests an urgent transfer, hang up and call my assistant on her usual number. Even if it really is me, I won’t be offended.”
The rule you have not publicly championed, your team will quietly circumvent to avoid bothering you. That is exactly what the attacker is counting on.
Actionable checklist
- N1 Written rule: any wire transfer or sensitive access triggers a callback on a number already on file — never the caller's
- N1 The executive publicly champions this procedure — including for requests that appear to come from them
- N1 Dual-authorisation threshold defined and signed off: beyond this amount, two approvals on two separate channels
- N2 Out-of-band verification question pre-agreed for high-risk operations (not transmitted by email)
- N2 Supplier payment-detail update process with systematic out-of-band verification
- N2 Finance team training: not on detection, but on applying the procedure when someone invokes urgency or secrecy
- N2 Internal directory of official numbers, kept up to date, accessible before any verification callback
- N2 Family code word agreed in person for non-professional emergency calls
- N3 Simulated voice-fraud attempt to test procedure under pressure
- N3 Quarterly audit of procedural compliance on transfers above the threshold
Further reading
The sources in the front matter document the three reference cases: the 2019 UK case analysed by Sophos and MIT Sloan, the $25.6 million Arup incident covered by CNN and Fortune, and the foiled Ferrari attempt detailed by Fortune and MIT Sloan Management Review.
CEO fraudScam where an attacker impersonates an executive to order an urgent wire transfer. via voice cloning fits into a broader vulnerability covered in Exposed executive: a specific threat model — which details the full range of attack vectors targeting executive profiles. The response once an incident has already been triggered is in Field incident response. The procedural framework for high-risk travel, which applies the same logic in a different context, is in Corporate travel policy.
Sources and further reading
- MIT Sloan Management Review — Deepfakes explained [report]
- Sophos — Scammers deepfake CEO's voice, $243,000 transfer (2019) [report]
- CNN — Arup deepfake scam loss, Hong Kong (2024) [report]
- Fortune — Arup $25 million, deepfake CFO call (2024) [report]
- Fortune — Ferrari deepfake CEO attempt foiled by verification question (2024) [report]
- MIT Sloan Management Review — How Ferrari hit the brakes on a deepfake CEO [report]