Photo-to-video models analyze the face in your image, extract a 3D-aware representation, and then re-render the face frame by frame to match an audio track. The original lighting, background, and identity are preserved while the mouth, jaw, eyes, and small head movements are synthesized.
It works on portraits, generated AI faces, historical photos, and even illustrations. Two warnings: very low-resolution photos and faces partially blocked by hair, hands, or glasses produce worse results.