vlogme.ai

Glossary

Text-to-speech (TTS)

Technology that converts written text into spoken audio. Modern neural TTS produces voices indistinguishable from human recordings and is the audio engine behind most talking avatars.

Text-to-speech (TTS) maps a string of text and a target voice to a waveform. Early TTS sounded robotic; modern systems use deep neural networks trained on hundreds of hours of human speech and produce natural prosody, breaths, and emotion.

On vlogme.ai you can pick any TTS voice, clone your own, or upload your own audio — the lip-sync engine then drives the avatar from whichever audio source you choose.

Related terms