Vision & Generation

Voice Cloning

Generating a synthetic voice that mimics a specific person's vocal identity from a short audio sample.

In common use since 2017

Voice cloning is the AI capability of generating a synthetic voice that mimics a specific person's vocal identity from a short audio sample — sometimes as little as 30 seconds. The technology has matured rapidly over 2023–2026 and is now embedded in mainstream products (audiobook narration, video dubbing, voice agents) and equally in deepfake-driven scams that have prompted urgent regulatory and platform-level responses.

The 2026 commercial landscape:

  • ElevenLabs Instant Voice Cloning — minute-of-audio clone with strong quality; consent verification required.
  • ElevenLabs Professional Voice Cloning — hours of clean audio; broadcast-quality clone.
  • Resemble, PlayHT, Speechify — commercial competitors with similar feature sets.
  • OpenVoice, XTTS v2, F5-TTS — open-source options for self-hosting.
  • Hume AI, Cartesia — focused on emotional and conversational quality.
  • Microsoft VALL-E (research) — extreme few-shot capability, restricted release for safety.

Legitimate use cases that ship in production:

  • Audiobook narration — authors voice their own books at scale; deceased authors' estates re-voice classics.
  • Video dubbing — translate a video into 30+ languages while preserving the original speaker's voice and emotion (ElevenLabs Dubbing Studio is the leading product here).
  • Personal voice assistants — your own AI that talks back in your voice for personal productivity.
  • Accessibility — voice banking for people with degenerative speech conditions; preserve a personal voice before it is lost.
  • Game and animation production — consistent character voicing across long projects without re-recording.
  • Real-time translation in calls — speak English, the other side hears your cloned voice in their language.

The dark side:

  • Phone scams — the "I am your child / boss / spouse, I need money urgently" voice deepfake call. The FBI and FTC have flagged this as one of the fastest-growing fraud vectors of 2024–2026.
  • Political disinformation — fake voice clips of candidates and officials; multiple high-profile incidents.
  • Defamation and harassment — fake voice clips of private individuals.
  • Bypassing voice authentication — banks and call centres have rapidly retired voice biometrics as primary auth.

The 2026 mitigation stack:

  • Consent verification — ElevenLabs requires recorded consent statements before unlocking voice cloning of a non-account-holder voice.
  • Watermarking — providers embed inaudible watermarks (SynthID, ElevenLabs watermark) detectable by the provider's own tools.
  • Deepfake detection — tools like Reality Defender, Sensity, and platform-built detectors flag suspected synthetic audio.
  • Provenance standards — C2PA for audio is gaining adoption.
  • Legal frameworks — the US ELVIS Act (Tennessee, 2024) and emerging federal proposals; the EU AI Act's transparency requirements; New York and California state laws on consent for synthetic voices.

For a US team building products that use voice cloning in 2026, the operational rules are: explicit consent, clear disclosure, watermarked output, and a refusal posture for any request that could enable fraud or impersonation. Legitimate products are growing fast; products that cut corners on consent and disclosure are increasingly facing legal exposure and platform bans.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary