09Chapters — tap to jump
What VlogMe makes
A 9:16 talking-avatar video, built from a still portrait, a voice, and a line of text — the format TikTok, Reels and YouTube Shorts were made for.
You give us a face, a voice, and a script. We render the avatar speaking it with natural mouth movement and emotion. A single avatar can speak one line, or you can chain several photos together to build a story.
Between any two lines you can drop a short generative b-roll clip — either layered on top while the avatar keeps talking (overlay) or as a wordless cutaway that replaces the avatar for a few seconds (chain). That's the whole creative model.
Credits & plans
One simple meter: roughly one credit per second of finished video. No hidden surcharges for voices, effects, or b-roll.
- 1 credit ≈ 1 second of finished video.
- Minimum charge per render is 10 credits.
- B-roll clips count toward the same total based on their seconds.
- The free plan gives you 60 credits to try the full pipeline.
Your balance lives in the top-right of every page. Plan limits and monthly refills live on Pricing.
Your first video in 3 minutes
Five steps from a blank page to a finished MP4 you can download or post. Everything below this chapter is optional depth.
- Open Create — the big button in the header.
- Drop a portrait photo. Front-facing, eyes visible, single face in frame, even lighting. JPG or PNG, at least 512px on the short side.
- Pick a voice. The picker shows gender, accent and a one-line description. Click the preview to hear a sample.
- Type one line of speech. A few words up to a couple of sentences works well for the first try. Avoid all-caps and emoji — the voice reads them literally.
- Press Create. A render takes 30–90 seconds. The page polls until the video is ready and shows a preview you can download or share.
Common photo mistakes
The renderer cares more about the photo than the script. Avoid these six and you'll be ahead of most first-time users.
- Cropped chin or forehead. The avatar needs the full face, hairline to jaw.
- Sunglasses or heavy shadows over the eyes. Mouth tracking still works, but eye contact dies.
- Multiple faces in frame. Pick a photo with just your subject — bystanders confuse the detector.
- Profile shots. Up to ~15° turn is fine; pure side profiles break lip-sync.
- Tiny face in a huge background. Crop to head-and-shoulders before uploading.
- Heavy beauty filters. Filters that smooth skin to plastic also smooth out mouth landmarks.
Multi-scene videos
Add several photos and switch between them by starting a line with the photo number. Each photo becomes a new scene with its own voice.
Every avatar in the timeline is numbered. A line that starts with @image1, @image2, etc. opens a new scene, and the matching photo speaks everything that follows until the next switch.
@image1
Hey, I'm Alex — welcome to the channel.
@image2
And I'm Sam. Today we're doing a live walkthrough.Two photos, two scenes, two voices. You assign a voice per photo in the editor sidebar; the script itself only references images by number.
B-roll inserts — overlay vs chain
Two flavors of inline video clip, both written with curly braces and a seconds suffix. The difference is what happens to the voice.
Overlay — voice keeps talking
A clip generated from another photo is layered on top of the avatar. The avatar's voice keeps playing underneath. Use this when a visual illustrates what's being said.
@image1
Our new espresso machine pulls a perfect shot {@image2 close-up of espresso pouring, steam rising}:3 every single morning.Chain — voice pauses
Without a photo reference inside the braces, the clip replaces both the video and the voice for that many seconds. Speech pauses, then resumes. Use this for a wordless mid-line cutaway.
@image1
Watch this. {camera pans to reveal a snowy mountain at sunrise}:4 …Worth getting up early for.Pauses, music, captions
Three quality knobs that turn a draft into something worth posting.
- Pauses in speech — drop
[pause]inside the line (square brackets, an ElevenLabs voice tag). Never{pause}— curly braces are reserved for b-roll. - Background music — toggle on in the editor sidebar; we duck the music automatically under speech.
- Auto captions — burned into the 9:16 export when enabled. Timing follows speech exactly, including under overlay inserts.
Publishing to socials
Post once from the studio to every platform — now or on a schedule.
From Social you can connect TikTok, Instagram, YouTube and others, then post immediately or schedule for later. Each connected account stores a refresh token; revoke it from the same page anytime.
Troubleshooting & FAQ
The questions we answer most often. If you don't see yours here, drop us a note.
+My render hangs at 99%
+The avatar's mouth doesn't match the voice
+The voice reads my emoji and asterisks out loud
+I ran out of credits mid-project
+Can I generate videos from code or have ChatGPT do it for me?
Still stuck?
We read every message.
Drop a note in feedback and we'll write back — usually within a day.
Open feedback →