vlogme.ai

VlogMe / Manual

From a single photo to a talking-avatar reel — in plain English.

Nine short chapters that explain what VlogMe does and how to use it well — no code, no API, no jargon. Building an integration? Jump to the API & MCP reference →

09Chapters — tap to jump
  1. 01What VlogMe makes
  2. 02Credits & plans
  3. 03Your first video in 3 minutes
  4. 04Common photo mistakes
  5. 05Multi-scene videos
  6. 06B-roll inserts
  7. 07Pauses, music, captions
  8. 08Publishing to socials
  9. 09Troubleshooting & FAQ
01
Concept

What VlogMe makes

A 9:16 talking-avatar video, built from a still portrait, a voice, and a line of text — the format TikTok, Reels and YouTube Shorts were made for.

You give us a face, a voice, and a script. We render the avatar speaking it with natural mouth movement and emotion. A single avatar can speak one line, or you can chain several photos together to build a story.

Between any two lines you can drop a short generative b-roll clip — either layered on top while the avatar keeps talking (overlay) or as a wordless cutaway that replaces the avatar for a few seconds (chain). That's the whole creative model.

Step 01
Photo
Front-facing portrait
Step 02
Voice
Pick from the catalog
Step 03
Video
9:16 talking avatar
02
Account

Credits & plans

One simple meter: roughly one credit per second of finished video. No hidden surcharges for voices, effects, or b-roll.

  • 1 credit ≈ 1 second of finished video.
  • Minimum charge per render is 10 credits.
  • B-roll clips count toward the same total based on their seconds.
  • The free plan gives you 60 credits to try the full pipeline.

Your balance lives in the top-right of every page. Plan limits and monthly refills live on Pricing.

03
Quickstart

Your first video in 3 minutes

Five steps from a blank page to a finished MP4 you can download or post. Everything below this chapter is optional depth.

  1. Open Create — the big button in the header.
  2. Drop a portrait photo. Front-facing, eyes visible, single face in frame, even lighting. JPG or PNG, at least 512px on the short side.
  3. Pick a voice. The picker shows gender, accent and a one-line description. Click the preview to hear a sample.
  4. Type one line of speech. A few words up to a couple of sentences works well for the first try. Avoid all-caps and emoji — the voice reads them literally.
  5. Press Create. A render takes 30–90 seconds. The page polls until the video is ready and shows a preview you can download or share.
04
Reference

Common photo mistakes

The renderer cares more about the photo than the script. Avoid these six and you'll be ahead of most first-time users.

  • Cropped chin or forehead. The avatar needs the full face, hairline to jaw.
  • Sunglasses or heavy shadows over the eyes. Mouth tracking still works, but eye contact dies.
  • Multiple faces in frame. Pick a photo with just your subject — bystanders confuse the detector.
  • Profile shots. Up to ~15° turn is fine; pure side profiles break lip-sync.
  • Tiny face in a huge background. Crop to head-and-shoulders before uploading.
  • Heavy beauty filters. Filters that smooth skin to plastic also smooth out mouth landmarks.
05
Scripting

Multi-scene videos

Add several photos and switch between them by starting a line with the photo number. Each photo becomes a new scene with its own voice.

Every avatar in the timeline is numbered. A line that starts with @image1, @image2, etc. opens a new scene, and the matching photo speaks everything that follows until the next switch.

script
@image1
Hey, I'm Alex — welcome to the channel.

@image2
And I'm Sam. Today we're doing a live walkthrough.

Two photos, two scenes, two voices. You assign a voice per photo in the editor sidebar; the script itself only references images by number.

06
Scripting

B-roll inserts — overlay vs chain

Two flavors of inline video clip, both written with curly braces and a seconds suffix. The difference is what happens to the voice.

Overlay — voice keeps talking

A clip generated from another photo is layered on top of the avatar. The avatar's voice keeps playing underneath. Use this when a visual illustrates what's being said.

script
@image1
Our new espresso machine pulls a perfect shot {@image2 close-up of espresso pouring, steam rising}:3 every single morning.

Chain — voice pauses

Without a photo reference inside the braces, the clip replaces both the video and the voice for that many seconds. Speech pauses, then resumes. Use this for a wordless mid-line cutaway.

script
@image1
Watch this. {camera pans to reveal a snowy mountain at sunrise}:4 …Worth getting up early for.
Overlayvoice continues
avatar speaking ───────────────
b-roll on top
Chainvoice pauses
avatar
b-roll (silent)
avatar
07
Polish

Pauses, music, captions

Three quality knobs that turn a draft into something worth posting.

  • Pauses in speech — drop [pause] inside the line (square brackets, an ElevenLabs voice tag). Never {pause} — curly braces are reserved for b-roll.
  • Background music — toggle on in the editor sidebar; we duck the music automatically under speech.
  • Auto captions — burned into the 9:16 export when enabled. Timing follows speech exactly, including under overlay inserts.
08
Distribution

Publishing to socials

Post once from the studio to every platform — now or on a schedule.

From Social you can connect TikTok, Instagram, YouTube and others, then post immediately or schedule for later. Each connected account stores a refresh token; revoke it from the same page anytime.

09
Help

Troubleshooting & FAQ

The questions we answer most often. If you don't see yours here, drop us a note.

+My render hangs at 99%
A muxer step is finishing. Wait one more poll cycle; if it stays for over two minutes the render auto-fails and refunds credits — re-submit.
+The avatar's mouth doesn't match the voice
Almost always the photo. Re-read chapter 4 — front-facing, no sunglasses, single face, no heavy filters.
+The voice reads my emoji and asterisks out loud
It does. Strip emoji, **markdown** and ALL-CAPS from the script before rendering.
+I ran out of credits mid-project
Partial renders aren't charged. Buy a top-up on Pricing or wait for your monthly refill.
+Can I generate videos from code or have ChatGPT do it for me?
Yes — the REST API and the MCP server cover both. Everything is documented on the API & MCP reference.

Still stuck?

We read every message.

Drop a note in feedback and we'll write back — usually within a day.

Open feedback →