Glossary

Lip sync

The frame-by-frame alignment of a face's mouth movements to a target audio track so that the speaker visibly forms the right sounds.

Lip sync is the bridge between audio and face. A lip-sync model takes a speech waveform, extracts the sequence of phonemes (the smallest units of sound — /a/, /b/, /th/), maps each phoneme to a corresponding mouth shape (a viseme), and regenerates the face for each video frame.

Good lip-sync isn't just mouth shape — it includes jaw drop, tongue visibility, lip rounding, and natural micro-pauses between words. Bad lip-sync looks 'rubbery' or shows the mouth moving when the audio is silent.

Try lip-sync on your photo

Related terms

Talking avatar
A digital character — usually built from a single photo — whose lips, jaw, and expressions are animated by AI to match a chosen voice or script.
AI presenter
A virtual on-camera spokesperson generated by AI — used for explainer videos, product demos, courses, and internal communications.
AI dubbing
Automatically re-voicing a video into a new language, ideally with matched lip-sync and the original speaker's vocal identity.

Related terms

Talking avatar

AI presenter

AI dubbing