Glossary

Photo-to-video

Generating a moving video from a single still photograph — most commonly by animating the subject's face to speak.

Photo-to-video models analyze the face in your image, extract a 3D-aware representation, and then re-render the face frame by frame to match an audio track. The original lighting, background, and identity are preserved while the mouth, jaw, eyes, and small head movements are synthesized.

It works on portraits, generated AI faces, historical photos, and even illustrations. Two warnings: very low-resolution photos and faces partially blocked by hair, hands, or glasses produce worse results.

Animate a photo now

Related terms

Talking avatar
A digital character — usually built from a single photo — whose lips, jaw, and expressions are animated by AI to match a chosen voice or script.
Lip sync
The frame-by-frame alignment of a face's mouth movements to a target audio track so that the speaker visibly forms the right sounds.

Related terms

Talking avatar

Lip sync