Vidu Video and Image Generation Skill

Generate AI videos and images with Vidu via vidu-cli — text-to-image, text-to-video, image-to-video, start-end frame, reference-based generation, and material elements, up to 1080p/2K/4K.

Execution model: use vidu CLI

All execution is done via the vidu-cli CLI tool. Parameters are CLI flags (not raw JSON bodies).

Environment variables

VIDU_TOKEN (required) — Vidu API token
VIDU_BASE_URL (optional) — Default https://service.vidu.cn (mainland China); use https://service.vidu.com for overseas
VIDU_DEBUG (optional) — Set to 1 to print full response body to stderr for debugging

Stdout contract

Every command prints one line of JSON to stdout.
Success: {"ok": true, "trace_id": "...", ...} — exit code 0
Failure: {"ok": false, "error": {"type": "...", "http_status": ..., "code": "...", "message": "..."}} — exit code 1
trace_id appears on API-backed responses for support/debugging.
CRITICAL: Never guess why an error happened. Copy fields from error exactly. Full shapes and edge cases: references/parameters.md.

Error type values

http_error — API 4xx/5xx (http_status, code, message)
network_error — Connection failure or timeout
parse_error — Response is not valid JSON
client_error — Local issues (missing token, bad path, validation)

Main commands

Command	Purpose
`vidu-cli upload <image_path>`	Upload image → `upload_id`, `ssupload_uri`
`vidu-cli task submit --type ... --prompt ... [options]`	Submit task → `task_id`. `--image`: local path, URL, or `ssupload:?id=...` (auto-upload). `--video`: local path or `ssupload:?id=...` (character2video + 3.2_a only, auto-upload; URLs not supported). `--audio`: local path or `ssupload:?id=...` (character2video + 3.2_a only; URLs not supported).
`vidu-cli task get <task_id> [--output/-o <dir>]`	Query task → `state`, `type`, `model`; use `--output` to download media on success
`vidu-cli task compose --timeline <json> [--width N --height N] [--schedule-mode <mode>]`	Compose video from timeline → `task_id`. Query with `task get`. Supports `--schedule-mode` (auto-detected if omitted). MUST read references/compose.md before building the timeline JSON — do not guess the schema.
`vidu-cli task lip-sync --video <path> --text <text> [options]`	Lip-sync with text-to-speech → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task lip-sync --video <path> --audio <path>`	Lip-sync with audio file → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task lip-sync-voices`	List available lip-sync voices (90+, Chinese/English/Cantonese/Cartoon etc.)
`vidu-cli task tts --prompt ... --voice-id ...`	Text-to-speech → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task tts-voices`	List available TTS voices (300+, 20+ languages)
`vidu-cli task cost --type ... --model-version ... --duration ...`	Query credit cost for video/image tasks (estimate before submitting)
`vidu-cli task tts-cost --text ... --voice-id ...`	Query credit cost for TTS tasks (priced by character count; `--text` required)
`vidu-cli task lip-sync-cost --duration ... --voice-id ...`	Query credit cost for lip-sync tasks (defaults to voice `English_Aussie_Bloke` if omitted)
`vidu-cli quota pass`	Query claw-pass daily quota status
`vidu-cli quota credit`	Query user credit balance
`vidu-cli element create --name ... --image ... [--description ...] [--style ...]`	Create reference element (check → preprocess → create). Returns `id`, `version`.
`vidu-cli element check --name ...`	Check name availability
`vidu-cli element list [--keyword kw]`	List personal elements
`vidu-cli element search --keyword kw`	Search community elements

Smart input handling

--image (task submit, element create):

Local path → auto-upload (auto-compress when file is larger than 10MB)
http(s): URL → download then upload
ssupload:?id=... → use as-is

--video and --audio (task submit, character2video + 3.2_a only):

Local path → auto-upload
ssupload:?id=... → use as-is
http(s): URL → not supported (rejected with error)

Key Capabilities

text-to-image — Text-only image generation
text-to-video — Text-only video generation
image-to-video — One image + text → video
head-tail-image-to-video — Start + end frames + text
reference-to-image — Images + materials: 1–7 total; text prompt required; can be images-only, materials-only, or mixed; images-only needs no element create
reference-to-video — Same rule: 1–7 total; text prompt required; with 3.2_a model, also supports --video input (max 3, local files validated for size/dimensions/duration)
lip-sync — Drive video mouth movement with text-to-speech or audio file
text-to-speech — Convert text to speech audio via task tts
video-compose — Compose multi-track timeline (video/audio/subtitle/effect) into a single exported video via task compose
create-references — element create (single command)
search-community-references — element search
query-task — task get [--output <dir>]

Setup

npm install -g vidu-cli@latest (requires Node.js >=14; postinstall auto-downloads the platform binary)
Obtain VIDU_TOKEN (e.g. Vidu console).
Set VIDU_TOKEN environment variable (required); set VIDU_BASE_URL if not using default region.
Verify: vidu-cli task submit --help

Data usage and privacy (summary)

Content you send (prompts, images, task settings) goes to Vidu’s API. Confirm this meets your privacy and IP needs. Prefer least-privilege tokens for testing. Terms: https://www.vidu.com/terms (overseas), https://www.vidu.cn/terms (mainland China).

Async workflow (short)

Vidu generation is asynchronous: task submit → task_id → poll task get <task_id> until terminal state.
Model nicknames: Q1 → 3.0, Q2 → 3.1, Q2 Pro → 3.1_pro, Q3 → 3.2, Omni Video Pro → 3.2_a, 全能Video Pro → 3.2_a, Q3-A → 3.2_a (character2video supports --audio and --video; duration -1 or 4–15s). Additional variants exist: 全能Image 2 (GPT-Image 2) → 3.2_image_2 for multimodal visual generation with strong text rendering accuracy, plus 全能Q3 Fast → 3.2_fast_m, 全能Q3 Pro → 3.2_pro_m — see references/parameters.md for the complete per-task model version list.
Task-type summaries, task support matrix, copy-paste CLI examples, prompt tips, and element create/list/search details are in references/parameters.md.
Task lifecycle, retries, and polling guidance: references/errors_and_retry.md.

Implementation guide

For task submit (generation tasks)

Pick capability → map to --type and options using references/parameters.md (matrix + validation).
Always pass --resolution; default to 1080p unless the user explicitly requests a different supported value.
Prepare inputs: for reference2image / character2video, --image and/or --material so combined count is 1–7; for character2video with 3.2_a, also supports --video (max 3); optional [@name] in prompt per references/parameters.md.
(Optional) Query cost before submitting: use task cost, task tts-cost, or task lip-sync-cost to estimate credit usage and check eligibility.
vidu-cli task submit ... → store task_id and trace_id.
- schedule-mode auto-detection: if --schedule-mode is omitted, CLI queries claw-pass status and uses claw_pass when user has an active pass, otherwise normal. If submit fails with ClawPassExplicitModeRequired, tell the user their daily claw-pass quota is exhausted. Do not retry automatically — suggest re-submitting with --schedule-mode normal to use credits instead, or waiting for the next quota refresh.
vidu-cli task get <task_id> until success or failed; use --output <dir> to download media on success.
On success return downloaded_files (if --output used) or prompt user to re-run with --output; on task failure return err_code / err_msg; on CLI ok: false return error fields verbatim.

For task compose (video composition)

CRITICAL: Before constructing the --timeline JSON, you MUST read references/compose.md first. The timeline has a specific JSON schema with exact field names, nesting structure, and media_url rules. Do NOT guess the structure — always refer to compose.md for the complete schema, supported fields, and examples.

Read references/compose.md to understand the timeline JSON schema, media_url rules, and limits.
Build the timeline JSON following the exact structure: video_tracks[].video_track_clips[], audio_tracks[].audio_track_clips[], subtitle_tracks[].subtitle_track_clips[], effect_tracks[].effect_track_items[]. Every clip must include timeline_in and timeline_out (the CLI validates this and rejects timelines with missing values).
For media_url: use ssupload:?id=xxx, http URL, or local file path (auto-uploaded by CLI).
For file_url (subtitles): use ssupload:?id=xxx, http URL, or local .srt file path.
vidu-cli task compose --timeline <file_or_json> [--width N --height N] [--schedule-mode <mode>] → returns task_id.
- schedule-mode auto-detection: same as task submit — if omitted, CLI auto-detects from claw-pass status. If compose fails with ClawPassExplicitModeRequired, suggest --schedule-mode normal to use credits instead.
vidu-cli task get <task_id> to poll status, same as other tasks.

Output to the user

After submit: return task_id and trace_id; state that processing is in progress.
After query: if state is success, return downloaded_files (if --output was used) or the task_id with a note to re-run with --output <dir> to download; if failed, return err_code and err_msg exactly (note: response may still have ok: true while state is failed).
On CLI failure (ok: false): report error.type, http_status, code, message exactly — do not infer causes.

References (bundled)

File	Contents
references/parameters.md	Task matrix, CLI flags, examples, prompt tips, validation
references/errors_and_retry.md	States, retries, polling
references/compose.md	Timeline schema, media_url rules, clip compose examples

Fallback (no Node.js / npm)

If node / npm / vidu-cli cannot be installed, this skill cannot run. Require vidu-cli latest (via npm install -g vidu-cli@latest, Node.js >=14) and point users to references/parameters.md for parameter details.

vidu-skills

Safety Notice

Copy this and send it to your AI assistant to learn

Vidu Video and Image Generation Skill

Execution model: use vidu CLI

Key Capabilities

Setup

Data usage and privacy (summary)

Async workflow (short)

Implementation guide

For task submit (generation tasks)

For task compose (video composition)

Output to the user

References (bundled)

Fallback (no Node.js / npm)

Source Transparency

Related Skills

Tandoor Recipe CLI

Plugin

Bring! Shoppinglist

embedded-systems