Text-to-speech (TTS) maps a string of text and a target voice to a waveform. Early TTS sounded robotic; modern systems use deep neural networks trained on hundreds of hours of human speech and produce natural prosody, breaths, and emotion.
On vlogme.ai you can pick any TTS voice, clone your own, or upload your own audio — the lip-sync engine then drives the avatar from whichever audio source you choose.