我可以使用vlogme.ai API构建什么？

任何需要程序化生成会说话的虚拟形象视频的应用程序：个性化销售视频、AI导师、新闻主播、产品说明、NPC对话、大规模本地化配音。一个POST请求将肖像URL + 脚本 + 语音ID作为输入，并返回一个完成的MP4。

MCP服务器如何与Claude Code、Codex和Cursor协同工作？

Vlogme在/api/mcp处公开了一个可流式传输的HTTP MCP端点。使用此页面上的CLI代码片段添加一次，您的代理即可原生调用list_voices、generate_video和get_video——无需胶水代码，也无需额外的SDK。

API渲染的积分如何计费？

与网页应用相同：每完成一秒视频约1个积分，最少10个。当您POST /videos时扣除积分，如果渲染失败或在完成之前删除，则自动退款。

有速率限制吗？

默认限制很宽松（基础计划每天数百次渲染，专业计划数千次）。如果您因发布或批量迁移需要更高的吞吐量，请发送电子邮件至support@vlogme.ai，我们将提高限制。

Webhooks支持重试和签名验证吗？

支持。每个webhook都包含一个X-Vlogme-Signature标头（原始主体的sha256 HMAC，您的令牌作为密钥）。失败的交付将以指数退避的方式重试24小时。

哪些计划包含API访问权限？

API 和 MCP 适用于从 Basic 开始的所有付费套餐。没有有效订阅的账户只能使用网页编辑器。

AI 视频生成 API 与 MCP | VlogMe Developers

13共 13 节 — 点击跳转

01AI 视频生成 API 可以做什么
02在 Replicate 运行 VlogMe Avatar
03身份验证
04快速开始
05REST 接口
06创建 AI 视频
07脚本语法
08Webhook
09MCP 服务器
10MCP 客户端配置
11AI 智能体完整示例
12智能体 Skill
13错误与重试

概览

AI 视频生成 API 可以做什么

通过程序创建 AI 视频项目与口播数字人，并与网页端共用账户、积分和任务记录。

后端服务、自动化和产品集成可使用 REST；需要 AI 智能体自行发现工具、估算费用并提交任务时可使用 MCP。

公开 v1 API 为异步模式：保存任务 ID，通过轮询或签名 Webhook 获取结果。付费生成需要符合条件的 VlogMe 套餐。

扣除积分前先估算费用。
用人物照片、文字与声音，或直接用录音生成口播视频。
读取进度并获取有时效的签名下载链接。
使用 Idempotency-Key 安全重试付费请求。

托管模型

在 Replicate 运行 VlogMe Avatar

如果只需要“图片 + 音频”端点，也可以使用托管在 Replicate 的公开 VlogMe Avatar bridge。

适合已经通过 Replicate 管理基础设施与计费的项目。积分、历史、REST 与 MCP 等完整能力仍通过 VlogMe API 提供。

托管 bridge 返回竖屏口播数字人 MP4，并默认开启字幕。

在 Replicate 运行 Avatar GitHub 示例

配置

身份验证

每个受保护请求都需要 Bearer token。Token 只在创建时显示一次，请保存到密钥管理服务。

在 Settings → API 中创建。不要把 Token 放入浏览器代码、移动应用包、公开仓库或客户端日志。

REST 基础地址为 https://vlogme.ai/api/public/v1。如可能泄露，请立即轮换。

http

Authorization: Bearer vlm_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

url

https://vlogme.ai/api/public/v1

开始

快速开始

创建 Token、估算请求、提交生成，并持续查询直到任务进入终态。

示例会发送人物照片 URL、脚本、voice_id 和画面比例。请替换为 VlogMe 服务器可访问的素材。

POST 成功后立即返回 202 Accepted。可每约十秒轮询一次，也可传入 webhook_url。

bash

curl -X POST https://vlogme.ai/api/public/v1/videos \
  -H "Authorization: Bearer $VLOGME_TOKEN" \
  -H "Idempotency-Key: video-request-001" \
  -H "Content-Type: application/json" \
  -d '{
    "portrait_url": "https://example.com/portrait.jpg",
    "aspect_ratio": "16:9",
    "script": "Welcome to the demo.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "webhook_url": "https://yourapp.com/webhooks/vlogme"
  }'

REST

REST 接口

简洁的 JSON API，提供 OpenAPI 3.1、稳定错误码与限流响应头。

机器可读规范位于 /api/public/v1/openapi.json，可生成类型化客户端或导入 Postman、Insomnia。

所有响应包含 X-Request-Id；异步 POST /videos 还会在 Location 中返回轮询地址。

GET/me用户 ID、套餐与积分余额

GET/voices可用于语音合成的 voice_id

POST/videos启动异步生成任务

GET/videos/:id读取状态与签名下载链接

GET/videos分页获取最近生成记录

GET/videos/estimate不扣费估算所需积分

DELETE/videos/:id删除或取消符合条件的任务

GET/health无需认证的服务存活检查

bash

curl https://vlogme.ai/api/public/v1/videos/$VIDEO_ID \
  -H "Authorization: Bearer $VLOGME_TOKEN"

REST

创建 AI 视频

提供人物照片，以及 script + voice_id 或音频素材。视频生成会异步执行。

照片使用 portrait_url 或 portrait_base64。语音可用 script + voice_id，或 audio_url/audio_base64。可选字段包括 aspect_ratio、emotion_preset、live_subtitles、title 与 webhook_url。

每个付费 POST 必须包含唯一 Idempotency-Key。使用相同 Key 重试会返回原任务，不会重复扣费。

画面比例：默认 9:16，也支持 16:9 与 1:1。
顶层 inserts 可添加 overlay 或 cut 模式的 B-roll。
项目背景音 audio_mode 可设为 auto、prompt、asset 或 off。
202 响应包含 id、status、credits_charged、estimated_seconds 与 warnings。

bash

curl -X POST https://vlogme.ai/api/public/v1/videos \
  -H "Authorization: Bearer $VLOGME_TOKEN" \
  -H "Idempotency-Key: video-request-001" \
  -H "Content-Type: application/json" \
  -d '{
    "portrait_url": "https://example.com/portrait.jpg",
    "aspect_ratio": "16:9",
    "script": "Welcome to the demo.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "webhook_url": "https://yourapp.com/webhooks/vlogme"
  }'

脚本

脚本语法

机器可读 DSL 用于描述场景切换、B-roll、音频插入、停顿和声音表演标签。

使用 @imageN 切换说话场景，花括号表示 overlay/chain B-roll，@audioN 插入音频，并可使用 [shocked] 等 ElevenLabs 表演标签。

下载规范保留供工具与 LLM 使用的契约语言。MCP 客户端也可通过 script_grammar_help 读取同一规范。

下载 .md 打开原始规范

SCRIPT-GRAMMAR.md

# VlogMe Script Grammar — canonical contract

> Single source of truth for two contracts:
>
> **A. Frontend ↔ Backend (script grammar)** — what humans and AI write.
> **B. Backend ↔ Worker (payload)** — the structured JSON the render fleet consumes.
>
> The frontend never builds a worker payload directly. `flatToPayloads()` in
> `src/lib/scenes/flat-script.ts` is the single bridge between A and B.

---

## A. Script grammar (frontend ↔ backend)

A VlogMe script is plain text driving a short-form render. The project-level
`aspect_ratio` decides the frame (`9:16`, `16:9`, or `1:1`); the script grammar
is identical for all three formats. There are **four explicit beat forms**, plus
a plain-text continuation line for the current avatar. The shape of the token
decides the kind — never the card role.

| #   | Form                   | Meaning                                                                |
| --- | ---------------------- | ---------------------------------------------------------------------- |
| 1   | `@imageN <text>`       | Avatar speaks `<text>`, anchored to photo slot N.                      |
| 2   | `@imageN { <prompt> }` | Standalone video clip generated from photo N. Prompt is in the braces. |
| 3   | `{ @imageM <prompt> }` | Overlay clip on top of the current avatar, using photo M as 1st frame. |
| 4   | `{ <prompt> }`         | Continue — extends the previous clip, no reference image of its own.   |

Plus one media token: `@audioN` on its own line drops audio clip N on the
current avatar.

Plain text on its own line continues speech for the current avatar. This is
mainly used after an overlay beat:

```text
@image1 So I open the laptop.
{ @image3 slow push-in on the glowing keyboard }:3
And the cursor starts moving on its own.
```

### Rules

- Whitespace, blank lines and indentation are **insignificant** — only the
  token shape matters.
- Avatar text (form 1) preserves its newlines verbatim into the worker
  payload's `script` field. Multi-line dialogue stays multi-line end-to-end.
- The current avatar = the most recent avatar speech tag. Forms 3 and 4 anchor
  to that avatar/beat.
- Form 4 requires a preceding rendered frame/current avatar to continue from.
- Form 3 starts directly with `@imageN`: `{ @image3 fade out }`.
- In form 4, the optional keyword `continue` is stripped:
  `{ continue sparks fly }` works the same as `{ sparks fly }`.
- `@imageN-K` (K in 0..100): cross-fade transition code applied when the
  avatar switches (form 1 only).

### Duration override

Any form 2 / 3 / 4 may carry an explicit clip duration in seconds:

- `{ ... }:D` — after the closing brace.
- `@imageN { ... }:D` — same.

Without `:D` the clip uses the system default (3s), capped at 5s.

### Brace body options

Forms 2 / 3 / 4 can use either a plain prompt or an advanced pipe body:

```text
@image2 { slow push-in on a mountain lake }:3
{ @image3 v:glowing OPEN sign, slow push-in | n:watermark, text | s:soft city hum | t5 }:3
{ v:camera keeps drifting upward | am:off }:2
```

Supported segments:

- `v:<visual prompt>` — main visual prompt.
- `n:<negative prompt>` — visual negative prompt.
- `s:<sound prompt>` / `an:<negative sound>` — generated audio prompt fields.
- `@audioM` — uploaded audio clip under this video beat.
- `am:auto|prompt|asset|off` — audio mode.
- `ag:<db>` — audio gain.
- `tK` — transition code, 0..100.

### Examples

```text
@image1 Hey, let me show you something.

@image1 So I open the laptop.
{ @image3 slow push-in on the glowing keyboard }:3
And the cursor starts moving on its own.

@image2 { dolphin breaches in slow motion against the sunrise }:4
{ camera keeps drifting up, sky turns deeper orange }:2

@image1 [shocked] What the hell is happening?
@audio1
```

Breakdown:

- Line 1 — avatar 1 speaks one sentence.
- Line 3 — overlay over the current avatar using photo 3.
- Line 4 — avatar 1 continues speaking.
- Line 6 — standalone video from photo 2 (4s).
- Line 7 — continue clip extending the previous shot (2s, no reference photo).
- Line 9 — avatar 1 again, with an ElevenLabs `[shocked]` audio tag.
- Line 10 — drops audio clip 1 on top of the avatar.

### Legacy back-compat

The old "bare `@imageN` lets the card's role decide" form is still accepted
by the parser. Specifically: an `@imageN` line **without** braces falls back
to `card.role` + `card.effectInsertMode` to pick avatar / video / overlay /
continue. This keeps the timeline editor — which currently edits cards in
the bare form — working unchanged while we migrate the UI to render explicit
brace cards.

When AI generates a script, it should prefer the **four explicit forms**.
Parser/serializer round-trips currently normalize back to the legacy bare form
for the timeline UI; that will change in a follow-up.

### Forbidden

- Mid-line `@imageN` outside of a brace. Each tag is on its own line OR
  inside a `{ … }` block.
- Inline overlay inside avatar speech, e.g. `@image1 words {@image2 ...}:3`.
- Bare `@imageN` video instructions from new AI output. Generate
  `@imageN { ... }:D` instead.
- Nested braces.
- Anything else not listed in §A.

---

## B. Worker payload (backend ↔ worker)

The worker consumes `ScenePayload[]` (defined in `src/lib/scenes/scenes.ts`).
The frontend never builds these — `flatToPayloads()` is the only producer.

Each scene has an `input_type` discriminator:

| `input_type`    | Comes from script form | Key fields                                                                                                                   |
| --------------- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `text`          | Form 1                 | `image_path`, `script` (avatar speech), `voice_id`, `transition`                                                             |
| `audio`         | `@audioN`              | `image_path` (current avatar), `audio_path`, `audio_duration_seconds`                                                        |
| `ltx`           | Form 2                 | `image_path` (source photo), `script` (visual prompt), `audio_duration_seconds`                                              |
| `video_overlay` | Form 3                 | `image_path` (anchor avatar), `overlay_source_image_path` (photo M), `overlay_visual_prompt`, `overlay_min_duration_seconds` |
| `video_chain`   | Form 4                 | `chain_from_previous: true`, `overlay_source_image_path`, `overlay_visual_prompt`                                            |

Insert variants (overlay / cut) for nested timing inside an avatar audio
chunk follow `docs/public-api-inserts-contract.md`. That doc is the contract
between the frontend builder pipeline and the public render API.

The worker payload shape is **stable** — adding new script grammar features
must NOT require worker changes. Map them in `flatToPayloads()` first.

---

## Pipeline (who owns what)

```text
WishesField                            ← user types brief + uploads media
   │
   ▼
buildScriptFromBrief (server)          ← Gemini generates canonical script
   │
   ▼
useFlatScriptBridge                    ← parses script, mirrors timeline ↔ Code
   │
   ▼
TimelineBlocks                         ← user-editable cards
   │
   ▼
flatToPayloads()                       ← THE bridge: script → ScenePayload[]
   │
   ▼
prepare-render (server)                ← per-scene asset staging, validation
   │
   ▼
render fleet workers                   ← Hunyuan / overlay / avatar
```

**Key principle**: the script grammar is the _only_ thing the user (or AI)
authors. Everything downstream — payload shape, worker inputs, asset paths,
voice IDs, transition codes — is derived. Adding new script behavior =
extend `flatToPayloads()`; don't ask AI or users to write payload JSON.

REST

Webhook

传入 webhook_url 后，VlogMe 会在完成或失败时发送事件，并对暂时性投递失败进行重试。

使用 Settings → API 中独立的 whsec_ 密钥，按 timestamp + raw_body 验证 X-Vlogme-Signature。拒绝超过五分钟的时间戳，并按 X-Vlogme-Event-Id 去重。

请在五秒内返回任意 2xx。网络错误与 5xx 会退避重试，4xx 视为永久拒绝。若签名链接过期，请重新 GET 视频。

python

ts = request.headers["X-Vlogme-Timestamp"]
secret = "whsec_..."  # Settings -> API
expected = "sha256=" + hmac_sha256(secret, ts + "." + raw_body).hex()
assert constant_time_eq(expected, request.headers["X-Vlogme-Signature"])
assert abs(now() - int(ts)) < 300

MCP

MCP 服务器

使用 VlogMe Token 或交互式 OAuth，通过原生 Streamable HTTP 连接 AI 智能体。

MCP 工具是 REST v1 的轻量封装，共用数据结构、积分公式与稳定错误码。更新只会新增字段，客户端应忽略未知字段。

工具覆盖声音、余额、估算、人物图、生成、状态、取消和历史。获得授权的内部账户还可处理工作项。

endpoint

POST  https://mcp.vlogme.ai/api/mcp

script_grammar_helplist_voicesget_balanceestimate_creditslist_portraitsgenerate_videoget_videocancel_videolist_my_videoslist_bugsget_bugreport_bugupdate_bug_status

MCP

MCP 客户端配置

Claude Code、Cursor 与 Codex 支持 Streamable HTTP；旧版 stdio 客户端可使用 mcp-remote。

只选择一种认证方式：交互式 OAuth 或环境变量中的 API Token，不要混用。

修改 MCP 配置后，请启动新的客户端会话，以重新发现工具。

bash

# OAuth
codex mcp add vlogme --url https://mcp.vlogme.ai/api/mcp
codex mcp login vlogme --scopes mcp:full,mcp:work_items

# or API token
export VLOGME_TOKEN=vlm_live_xxxxxxxxxxxx
codex mcp add vlogme --url https://mcp.vlogme.ai/api/mcp --bearer-token-env-var VLOGME_TOKEN

json

{
  "mcpServers": {
    "vlogme": {
      "url": "https://mcp.vlogme.ai/api/mcp",
      "headers": { "Authorization": "Bearer YOUR_TOKEN_HERE" }
    }
  }
}

MCP

AI 智能体完整示例

自然语言需求会转换成一组可见、可审计的工具调用。

智能体可以依次查询声音、估算费用、请求确认、调用 generate_video，并用 get_video 监控到终态。

仍需提供可访问的人物照片、有效套餐与足够积分。MCP 和 REST 使用相同的套餐限额。

prompt

Create a 16:9 talking-avatar video from this portrait.
Use a warm, natural voice. Estimate the credits first and ask before generating.
After approval, monitor the job and return the final download URL.

智能体

智能体 Skill

通过简短操作规范，让智能体先估算、请求确认，并正确处理异步任务状态。

说明何时使用 VlogMe、调用顺序、如何维持幂等性，以及何时停止轮询。

不要把真实 Token 写入 Skill 文件，应保存在客户端环境或 OAuth 存储中。

markdown

---
name: vlogme-video
description: Estimate and generate VlogMe video jobs through MCP.
---
1. Validate the portrait and requested format.
2. Call estimate_credits before a paid generation.
3. Ask for approval with the estimate.
4. Use a stable idempotency key for transport retries.
5. Poll get_video until a terminal status.

参考

错误与重试

错误格式为 { error: { code, message } }。程序应根据稳定 code 分支，而不是依赖会翻译的 message。

联系支持时提供 X-Request-Id。429 会包含 Retry-After；认证与输入错误应先修正，不要无条件重试。

OpenAPI 包含完整结构和错误列表。成功响应会返回限流的 limit、remaining 与 reset。

codeHTTP说明

missing_token401{ error: { code: "missing_token", message } }

invalid_token401{ error: { code: "invalid_token", message } }

token_expired401{ error: { code: "token_expired", message } }

plan_required403{ error: { code: "plan_required", message } }

insufficient_credits402{ error: { code: "insufficient_credits", message } }

invalid_input400{ error: { code: "invalid_input", message } }

invalid_asset400{ error: { code: "invalid_asset", message } }

invalid_json400{ error: { code: "invalid_json", message } }

not_found404{ error: { code: "not_found", message } }

method_not_allowed405{ error: { code: "method_not_allowed", message } }

already_started409{ error: { code: "already_started", message } }

billing_conflict409{ error: { code: "billing_conflict", message } }

rate_limited429{ error: { code: "rate_limited", message } }

internal_error500{ error: { code: "internal_error", message } }

面向代码、AI 智能体与 LLM 的 AI 视频生成 API。

AI 视频生成 API 可以做什么

在 Replicate 运行 VlogMe Avatar

身份验证

快速开始

REST 接口

创建 AI 视频

脚本语法

Webhook

MCP 服务器

MCP 客户端配置

AI 智能体完整示例

智能体 Skill

错误与重试

开发者常见问题

面向代码、AI 智能体与 LLM 的 AI 视频生成 API。

开发者 常见问题

开发者常见问题