Glossary

Voice cloning

Synthesizing a new voice that sounds like a specific real person, typically from a short audio sample.

Voice cloning models learn the timbre, pacing, accent, and emotional 'fingerprint' of a target voice. Modern systems need only 10–30 seconds of clean speech to produce a clone that can read arbitrary new text in that voice.

Used responsibly, voice cloning is a creator superpower: localize your own voice into 30+ languages, generate updates to evergreen videos without re-recording, or save your voice for accessibility. Used carelessly it's a deepfake risk — always get consent from the voice owner.

Try voice cloning

Related terms

Neural voice
A text-to-speech voice generated by a deep neural network, producing more natural intonation and emotion than older concatenative or formant TTS.
AI dubbing
Automatically re-voicing a video into a new language, ideally with matched lip-sync and the original speaker's vocal identity.
AI presenter
A virtual on-camera spokesperson generated by AI — used for explainer videos, product demos, courses, and internal communications.

Related terms

Neural voice

AI dubbing

AI presenter