vlogme.ai APIで何ができますか？

プログラムによる話すアバター動画生成が必要なものは何でも可能です。パーソナライズされたセールス動画、AIチューター、ニュースキャスター、製品説明、NPCダイアログ、大規模なローカライズされたナレーションなど。1つのPOSTリクエストでポートレートURL + スクリプト + 音声IDを受け取り、完成したMP4を返します。

MCPサーバーはClaude Code、Codex、Cursorとどのように連携しますか？

Vlogmeは、Streamable-HTTP MCPエンドポイントを/api/mcpで公開しています。このページにあるCLIスニペットで一度追加すると、エージェントはlist_voices、generate_video、get_videoをネイティブに呼び出すことができます。グルーコードや追加のSDKは不要です。

APIレンダリングのクレジットはどのように請求されますか？

ウェブアプリと同じで、完成した動画の1秒あたり約1クレジット、最低10クレジットです。/videosにPOSTしたときにクレジットが課金され、レンダリングが失敗したり、完了前に削除したりすると自動的に返金されます。

レート制限はありますか？

デフォルトの制限は寛大です（Basicで数百回/日、Proで数千回/日）。ローンチや一括移行のために高いスループットが必要な場合は、support@vlogme.aiまでメールでお問い合わせください。制限を引き上げます。

ウェブフックはリトライと署名検証をサポートしていますか？

はい。すべてのウェブフックにはX-Vlogme-Signatureヘッダー（生ボディのsha256 HMAC、パスワードとしてトークン）が含まれています。配信に失敗した場合、24時間指数関数的なバックオフでリトライされます。

APIアクセスはどのプランに含まれていますか？

APIとMCPはBasic以上のすべての有料プランで利用できます。有効なサブスクリプションがないアカウントはウェブエディターのみ利用できます。

AI動画生成API・MCP | VlogMe Developers

13全13セクション — タップして移動

01AI動画生成APIでできること
02ReplicateでVlogMe Avatarを実行
03認証
04クイックスタート
05RESTエンドポイント
06AI動画を生成する
07スクリプト記法
08Webhook
09MCPサーバー
10MCPクライアント設定
11AIエージェント連携の一連の例
12エージェント用skill
13エラーと再試行

概要

AI動画生成APIでできること

Webアプリと同じアカウント、クレジット、履歴を使い、AI動画プロジェクトと話すアバターをプログラムから生成できます。

バックエンドや自動化にはREST、AIエージェントがツール確認・料金見積もり・生成実行を行う場合はMCPを使います。

公開v1 APIは非同期です。ジョブIDを保存し、ポーリングまたは署名付きWebhookで完了を確認します。生成には対象プランが必要です。

クレジット消費前に料金を見積もる。
人物写真とテキスト＋音声、または録音音声からアバター動画を生成。
進捗と期限付き署名URLを取得。
Idempotency-Keyで有料リクエストの再送を安全にする。

Hosted

ReplicateでVlogMe Avatarを実行

画像＋音声のシンプルなエンドポイントが必要な場合、公開VlogMe Avatar bridgeをReplicateでも利用できます。

すでにReplicateでインフラと請求を管理している場合に便利です。クレジット、履歴、REST、MCPを含む完全な連携はVlogMe APIを使います。

Hosted bridgeは字幕を標準で含む縦型の話すアバターMP4を返します。

ReplicateでAvatarを実行 GitHubのサンプル

設定

認証

保護されたすべてのリクエストでBearer tokenを送信します。トークンは作成時に一度だけ表示されます。

Settings → APIで作成し、シークレット管理に保存します。ブラウザコード、モバイルアプリ、公開リポジトリ、クライアントログには含めないでください。

RESTのベースURLは https://vlogme.ai/api/public/v1 です。漏えいの可能性がある場合はローテーションします。

http

Authorization: Bearer vlm_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

url

https://vlogme.ai/api/public/v1

開始

クイックスタート

トークン作成、料金見積もり、レンダリング開始、最終状態までの確認という順で実装します。

例では人物写真URL、スクリプト、voice_id、アスペクト比を送ります。プレースホルダーはVlogMeのサーバーから取得できる素材に置き換えます。

POST成功時は202 Acceptedがすぐ返ります。約10秒ごとに確認するかwebhook_urlを指定します。

bash

curl -X POST https://vlogme.ai/api/public/v1/videos \
  -H "Authorization: Bearer $VLOGME_TOKEN" \
  -H "Idempotency-Key: video-request-001" \
  -H "Content-Type: application/json" \
  -d '{
    "portrait_url": "https://example.com/portrait.jpg",
    "aspect_ratio": "16:9",
    "script": "Welcome to the demo.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "webhook_url": "https://yourapp.com/webhooks/vlogme"
  }'

REST

RESTエンドポイント

OpenAPI 3.1、安定したエラーコード、レート制限ヘッダーを備えたJSON APIです。

/api/public/v1/openapi.jsonから型付きクライアントを生成したり、PostmanやInsomniaへ取り込めます。

すべてのレスポンスにX-Request-Idが含まれ、非同期POST /videosはポーリング先をLocationでも返します。

GET/meユーザーID、プラン、クレジット残高

GET/voices音声合成に使えるvoice_id

POST/videos非同期レンダリングを開始

GET/videos/:id状態と署名付きダウンロードURL

GET/videos最近の生成履歴をページ取得

GET/videos/estimate課金せず必要クレジットを見積もり

DELETE/videos/:id対象ジョブを削除またはキャンセル

GET/health認証不要の稼働確認

bash

curl https://vlogme.ai/api/public/v1/videos/$VIDEO_ID \
  -H "Authorization: Bearer $VLOGME_TOKEN"

REST

AI動画を生成する

人物写真と、script＋voice_idまたは音声素材を送信します。レンダリングは非同期で進みます。

portrait_urlまたはportrait_base64を使います。音声はscript＋voice_id、またはaudio_url/audio_base64です。aspect_ratio、emotion_preset、live_subtitles、title、webhook_urlも指定できます。

有料POSTでは一意のIdempotency-Keyが必須です。同じキーで再送すると二重課金せず元のジョブを返します。

アスペクト比: 標準9:16、16:9、1:1。
トップレベルのinsertsでoverlayまたはcutのB-rollを追加。
背景音のaudio_modeはauto、prompt、asset、off。
202レスポンス: id、status、credits_charged、estimated_seconds、warnings。

bash

curl -X POST https://vlogme.ai/api/public/v1/videos \
  -H "Authorization: Bearer $VLOGME_TOKEN" \
  -H "Idempotency-Key: video-request-001" \
  -H "Content-Type: application/json" \
  -d '{
    "portrait_url": "https://example.com/portrait.jpg",
    "aspect_ratio": "16:9",
    "script": "Welcome to the demo.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "webhook_url": "https://yourapp.com/webhooks/vlogme"
  }'

スクリプト

スクリプト記法

シーン切替、B-roll、音声挿入、間、音声演技タグを機械可読なDSLで指定します。

@imageNで話者シーンを切り替え、波括弧でoverlay/chain B-roll、@audioNで音声素材、[shocked]などのElevenLabsタグを記述します。

ダウンロード仕様はツールとLLM向けの契約言語を維持します。MCPではscript_grammar_helpから同じ仕様を取得できます。

.mdをダウンロード仕様のURLを開く

SCRIPT-GRAMMAR.md

# VlogMe Script Grammar — canonical contract

> Single source of truth for two contracts:
>
> **A. Frontend ↔ Backend (script grammar)** — what humans and AI write.
> **B. Backend ↔ Worker (payload)** — the structured JSON the render fleet consumes.
>
> The frontend never builds a worker payload directly. `flatToPayloads()` in
> `src/lib/scenes/flat-script.ts` is the single bridge between A and B.

---

## A. Script grammar (frontend ↔ backend)

A VlogMe script is plain text driving a short-form render. The project-level
`aspect_ratio` decides the frame (`9:16`, `16:9`, or `1:1`); the script grammar
is identical for all three formats. There are **four explicit beat forms**, plus
a plain-text continuation line for the current avatar. The shape of the token
decides the kind — never the card role.

| #   | Form                   | Meaning                                                                |
| --- | ---------------------- | ---------------------------------------------------------------------- |
| 1   | `@imageN <text>`       | Avatar speaks `<text>`, anchored to photo slot N.                      |
| 2   | `@imageN { <prompt> }` | Standalone video clip generated from photo N. Prompt is in the braces. |
| 3   | `{ @imageM <prompt> }` | Overlay clip on top of the current avatar, using photo M as 1st frame. |
| 4   | `{ <prompt> }`         | Continue — extends the previous clip, no reference image of its own.   |

Plus one media token: `@audioN` on its own line drops audio clip N on the
current avatar.

Plain text on its own line continues speech for the current avatar. This is
mainly used after an overlay beat:

```text
@image1 So I open the laptop.
{ @image3 slow push-in on the glowing keyboard }:3
And the cursor starts moving on its own.
```

### Rules

- Whitespace, blank lines and indentation are **insignificant** — only the
  token shape matters.
- Avatar text (form 1) preserves its newlines verbatim into the worker
  payload's `script` field. Multi-line dialogue stays multi-line end-to-end.
- The current avatar = the most recent avatar speech tag. Forms 3 and 4 anchor
  to that avatar/beat.
- Form 4 requires a preceding rendered frame/current avatar to continue from.
- Form 3 starts directly with `@imageN`: `{ @image3 fade out }`.
- In form 4, the optional keyword `continue` is stripped:
  `{ continue sparks fly }` works the same as `{ sparks fly }`.
- `@imageN-K` (K in 0..100): cross-fade transition code applied when the
  avatar switches (form 1 only).

### Duration override

Any form 2 / 3 / 4 may carry an explicit clip duration in seconds:

- `{ ... }:D` — after the closing brace.
- `@imageN { ... }:D` — same.

Without `:D` the clip uses the system default (3s), capped at 5s.

### Brace body options

Forms 2 / 3 / 4 can use either a plain prompt or an advanced pipe body:

```text
@image2 { slow push-in on a mountain lake }:3
{ @image3 v:glowing OPEN sign, slow push-in | n:watermark, text | s:soft city hum | t5 }:3
{ v:camera keeps drifting upward | am:off }:2
```

Supported segments:

- `v:<visual prompt>` — main visual prompt.
- `n:<negative prompt>` — visual negative prompt.
- `s:<sound prompt>` / `an:<negative sound>` — generated audio prompt fields.
- `@audioM` — uploaded audio clip under this video beat.
- `am:auto|prompt|asset|off` — audio mode.
- `ag:<db>` — audio gain.
- `tK` — transition code, 0..100.

### Examples

```text
@image1 Hey, let me show you something.

@image1 So I open the laptop.
{ @image3 slow push-in on the glowing keyboard }:3
And the cursor starts moving on its own.

@image2 { dolphin breaches in slow motion against the sunrise }:4
{ camera keeps drifting up, sky turns deeper orange }:2

@image1 [shocked] What the hell is happening?
@audio1
```

Breakdown:

- Line 1 — avatar 1 speaks one sentence.
- Line 3 — overlay over the current avatar using photo 3.
- Line 4 — avatar 1 continues speaking.
- Line 6 — standalone video from photo 2 (4s).
- Line 7 — continue clip extending the previous shot (2s, no reference photo).
- Line 9 — avatar 1 again, with an ElevenLabs `[shocked]` audio tag.
- Line 10 — drops audio clip 1 on top of the avatar.

### Legacy back-compat

The old "bare `@imageN` lets the card's role decide" form is still accepted
by the parser. Specifically: an `@imageN` line **without** braces falls back
to `card.role` + `card.effectInsertMode` to pick avatar / video / overlay /
continue. This keeps the timeline editor — which currently edits cards in
the bare form — working unchanged while we migrate the UI to render explicit
brace cards.

When AI generates a script, it should prefer the **four explicit forms**.
Parser/serializer round-trips currently normalize back to the legacy bare form
for the timeline UI; that will change in a follow-up.

### Forbidden

- Mid-line `@imageN` outside of a brace. Each tag is on its own line OR
  inside a `{ … }` block.
- Inline overlay inside avatar speech, e.g. `@image1 words {@image2 ...}:3`.
- Bare `@imageN` video instructions from new AI output. Generate
  `@imageN { ... }:D` instead.
- Nested braces.
- Anything else not listed in §A.

---

## B. Worker payload (backend ↔ worker)

The worker consumes `ScenePayload[]` (defined in `src/lib/scenes/scenes.ts`).
The frontend never builds these — `flatToPayloads()` is the only producer.

Each scene has an `input_type` discriminator:

| `input_type`    | Comes from script form | Key fields                                                                                                                   |
| --------------- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `text`          | Form 1                 | `image_path`, `script` (avatar speech), `voice_id`, `transition`                                                             |
| `audio`         | `@audioN`              | `image_path` (current avatar), `audio_path`, `audio_duration_seconds`                                                        |
| `ltx`           | Form 2                 | `image_path` (source photo), `script` (visual prompt), `audio_duration_seconds`                                              |
| `video_overlay` | Form 3                 | `image_path` (anchor avatar), `overlay_source_image_path` (photo M), `overlay_visual_prompt`, `overlay_min_duration_seconds` |
| `video_chain`   | Form 4                 | `chain_from_previous: true`, `overlay_source_image_path`, `overlay_visual_prompt`                                            |

Insert variants (overlay / cut) for nested timing inside an avatar audio
chunk follow `docs/public-api-inserts-contract.md`. That doc is the contract
between the frontend builder pipeline and the public render API.

The worker payload shape is **stable** — adding new script grammar features
must NOT require worker changes. Map them in `flatToPayloads()` first.

---

## Pipeline (who owns what)

```text
WishesField                            ← user types brief + uploads media
   │
   ▼
buildScriptFromBrief (server)          ← Gemini generates canonical script
   │
   ▼
useFlatScriptBridge                    ← parses script, mirrors timeline ↔ Code
   │
   ▼
TimelineBlocks                         ← user-editable cards
   │
   ▼
flatToPayloads()                       ← THE bridge: script → ScenePayload[]
   │
   ▼
prepare-render (server)                ← per-scene asset staging, validation
   │
   ▼
render fleet workers                   ← Hunyuan / overlay / avatar
```

**Key principle**: the script grammar is the _only_ thing the user (or AI)
authors. Everything downstream — payload shape, worker inputs, asset paths,
voice IDs, transition codes — is derived. Adding new script behavior =
extend `flatToPayloads()`; don't ask AI or users to write payload JSON.

REST

Webhook

webhook_urlを渡すと、成功または失敗時にイベントを送信し、一時的な配信エラーでは再試行します。

Settings → APIの専用whsec_シークレットでtimestamp + raw_bodyに対するX-Vlogme-Signatureを検証します。5分より古い時刻を拒否し、X-Vlogme-Event-Idで重複排除します。

5秒以内に2xxを返してください。ネットワーク障害と5xxはバックオフ再試行、4xxは恒久的な拒否です。URL期限切れ時は動画を再GETします。

python

ts = request.headers["X-Vlogme-Timestamp"]
secret = "whsec_..."  # Settings -> API
expected = "sha256=" + hmac_sha256(secret, ts + "." + raw_body).hex()
assert constant_time_eq(expected, request.headers["X-Vlogme-Signature"])
assert abs(now() - int(ts)) < 300

MCP

MCPサーバー

VlogMeトークンまたは対話型OAuthを使い、Streamable HTTPでAIエージェントを接続します。

MCPツールはREST v1の薄いラッパーで、データ形式、クレジット計算、エラーコードを共有します。追加変更に備えて未知のフィールドを無視してください。

音声、残高、見積もり、人物写真、生成、状態、キャンセル、履歴のツールがあります。権限のある内部アカウントは作業項目も扱えます。

endpoint

POST  https://mcp.vlogme.ai/api/mcp

script_grammar_helplist_voicesget_balanceestimate_creditslist_portraitsgenerate_videoget_videocancel_videolist_my_videoslist_bugsget_bugreport_bugupdate_bug_status

MCP

MCPクライアント設定

Claude Code、Cursor、CodexはStreamable HTTPに対応し、古いstdio専用クライアントはmcp-remoteを利用できます。

対話型OAuthまたは環境変数のAPI tokenのどちらか一つを選び、混在させないでください。

MCP設定を変更したら、新しいクライアントセッションを開始してツールを再検出します。

bash

# OAuth
codex mcp add vlogme --url https://mcp.vlogme.ai/api/mcp
codex mcp login vlogme --scopes mcp:full,mcp:work_items

# or API token
export VLOGME_TOKEN=vlm_live_xxxxxxxxxxxx
codex mcp add vlogme --url https://mcp.vlogme.ai/api/mcp --bearer-token-env-var VLOGME_TOKEN

json

{
  "mcpServers": {
    "vlogme": {
      "url": "https://mcp.vlogme.ai/api/mcp",
      "headers": { "Authorization": "Bearer YOUR_TOKEN_HERE" }
    }
  }
}

MCP

AIエージェント連携の一連の例

自然言語の依頼が、確認可能なツール呼び出しの順序に変換されます。

エージェントは音声一覧、料金見積もり、ユーザー確認、generate_video、最終状態までのget_videoを順に実行できます。

取得可能な人物写真、対象プラン、十分なクレジットが必要です。MCPにもRESTと同じプラン別制限が適用されます。

prompt

Create a 16:9 talking-avatar video from this portrait.
Use a warm, natural voice. Estimate the credits first and ask before generating.
After approval, monitor the job and return the final download URL.

エージェント

エージェント用skill

見積もり、承認、非同期状態の扱いを短い運用ルールとしてエージェントへ渡します。

VlogMeを使う条件、ツール順序、冪等性の維持、ポーリング終了条件を記述します。

skillファイルに実トークンを入れず、クライアント環境またはOAuthストアで管理してください。

markdown

---
name: vlogme-video
description: Estimate and generate VlogMe video jobs through MCP.
---
1. Validate the portrait and requested format.
2. Call estimate_credits before a paid generation.
3. Ask for approval with the estimate.
4. Use a stable idempotency key for transport retries.
5. Poll get_video until a terminal status.

リファレンス

エラーと再試行

エラー形式は { error: { code, message } } です。翻訳されるmessageではなく安定したcodeで分岐します。

問い合わせ時はX-Request-Idを伝えてください。429はRetry-Afterを含みます。認証・入力エラーは無条件に再試行せず修正します。

OpenAPIに全スキーマとエラー一覧があります。成功レスポンスにはレート制限のlimit、remaining、resetが含まれます。

codeHTTP内容

missing_token401{ error: { code: "missing_token", message } }

invalid_token401{ error: { code: "invalid_token", message } }

token_expired401{ error: { code: "token_expired", message } }

plan_required403{ error: { code: "plan_required", message } }

insufficient_credits402{ error: { code: "insufficient_credits", message } }

invalid_input400{ error: { code: "invalid_input", message } }

invalid_asset400{ error: { code: "invalid_asset", message } }

invalid_json400{ error: { code: "invalid_json", message } }

not_found404{ error: { code: "not_found", message } }

method_not_allowed405{ error: { code: "method_not_allowed", message } }

already_started409{ error: { code: "already_started", message } }

billing_conflict409{ error: { code: "billing_conflict", message } }

rate_limited429{ error: { code: "rate_limited", message } }

internal_error500{ error: { code: "internal_error", message } }

コードやAIエージェントから使えるAI動画生成API。

AI動画生成APIでできること

ReplicateでVlogMe Avatarを実行

認証

クイックスタート

RESTエンドポイント

AI動画を生成する

スクリプト記法

Webhook

MCPサーバー

MCPクライアント設定

AIエージェント連携の一連の例

エージェント用skill

エラーと再試行

開発者の質問

コードやAIエージェントから使えるAI動画生成API。

開発者 の質問

開発者の質問