Doubao Native Media Skill
This is a native OpenClaw skill. Do not spin up the upstream MCP server unless the user explicitly asks for MCP compatibility.
Use this skill for
- Doubao / 豆包 text-to-image
- image-to-image or multi-reference image generation
- Doubao text-to-video or image-to-video
- querying an async Doubao video task by
task_id - troubleshooting Volcengine Ark endpoint/model issues
Commands
Generate an image
python3 {baseDir}/scripts/doubao_media.py image \
--prompt "A cinematic cyberpunk alley in rain" \
--size 2560x1440
Generate a video
python3 {baseDir}/scripts/doubao_media.py video \
--prompt "A panda astronaut waves on the moon" \
--video-duration 5 \
--fps 24 \
--resolution 1080p
Query a video task
python3 {baseDir}/scripts/doubao_media.py task --task-id your-task-id
Wait for a video task and optionally download the result
python3 {baseDir}/scripts/doubao_media.py wait \
--task-id your-task-id \
--timeout 600 \
--interval 5 \
--download-to ./doubao-result.mp4
Input rules
- Always prefer
--endpoint-idwhen the user has a provisioned Volcengine Ark endpoint. - Fall back to model names only when endpoint ids are unavailable.
- For video generation, this skill mirrors the upstream behavior and appends
--dur,--fps,--rs, and--ratioto the prompt when they are not already present. - If the user supplies image URLs, pass them through exactly; do not download or re-host unless asked.
Troubleshooting
- If neither
--endpoint-idnor a default endpoint env var exists, the script falls back to the default model env var. - If the API returns
InvalidEndpointOrModel.NotFound, ask the user to verify the Volcengine Ark endpoint authorization first. - Video generation is async. If generation succeeds, capture
task_idand query it later with thetasksubcommand, or usewaitfor automatic polling.
References
- Read
references/api-notes.mdwhen you need request shapes, defaults, or caveats.