When to Use
- User wants to create an explainer or tutorial video
- User asks to "explain" something in video form
- User wants narrated content with AI-generated visuals
- User says "explainer video", "解说视频", "tutorial video"
When NOT to Use
- User wants audio-only content without visuals (use
/speechor/podcast) - User wants a podcast-style discussion (use
/podcast) - User wants to generate a standalone image (use
/image-gen) - User wants to read text aloud without video (use
/speech)
Purpose
Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
shared/authentication.mdfor API key and headers - Follow
shared/common-patterns.mdfor polling, errors, and interaction patterns - Always read config following
shared/config-pattern.mdbefore any interaction - Never hardcode speaker IDs — always fetch from the speakers API
- Never save files to
~/Downloads/— use.listenhub/explainer/from config - Explainer uses exactly 1 speaker
- Mode must be
info(for Info style) orstory(for Story style) — neverslides(use/slidesskill instead)
Step -1: API Key Check
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/explainer"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"
CONFIG_PATH=".listenhub/explainer/config.json"
# (or $HOME/.listenhub/explainer/config.json for global)
Then run Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (explainer):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认风格:{info / story / 未设置}
默认主播:{speakerName / 未设置}
Ask: "使用已保存的配置?" → 确认,直接继续 / 重新配置
Setup Flow (first run or reconfigure)
Ask these questions in order, then save all answers to config at once:
-
outputMode: Follow
shared/output-mode.md§ Setup Flow Question. -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Style (optional): "默认风格?"
- "Info — 信息展示型"
- "Story — 故事叙述型"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Note: defaultSpeakers are saved after generation (see After Successful Generation section).
Interaction Flow
Step 1: Topic / Content
Free text input. Ask the user:
What would you like to explain or introduce?
Accept: topic description, text content, or concept to explain.
Step 2: Language
If config.language is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What language?"
Options:
- "Chinese (zh)" — Content in Mandarin Chinese
- "English (en)" — Content in English
Step 3: Style
If config.defaultStyle is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What style of explainer?"
Options:
- "Info" — Informational, factual presentation style
- "Story" — Narrative, storytelling approach
Step 4: Speaker Selection
Follow shared/speaker-selection.md for the full selection flow, including:
- Default from
config.defaultSpeakers.{language}(skip step if set) - Text table + free-text input
- Input matching and re-prompt on no match
Only 1 speaker is supported for explainer videos.
Step 5: Output Type
Question: "What output do you want?"
Options:
- "Text script only" — Generate narration script, no video
- "Text + Video" — Generate full explainer video with AI visuals
Step 6: Confirm & Generate
Summarize all choices:
Ready to generate explainer:
Topic: {topic}
Language: {language}
Style: {info/story}
Speaker: {speaker name}
Output: {text only / text + video}
Proceed?
Wait for explicit confirmation before calling any API.
Workflow
-
Submit (foreground):
POST /storybook/episodeswith content, speaker, language, mode → extractepisodeId -
Tell the user the task is submitted
-
Poll (background): Run the following exact bash command with
run_in_background: trueandtimeout: 600000. Do NOT use python3, awk, or any other JSON parser — usejqas shown:EPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present script:
Read
OUTPUT_MODEfrom config. Followshared/output-mode.mdfor behavior.inlineorboth: Present the script inline.Present:
解说脚本已生成! 「{title}」 在线查看:https://listenhub.ai/app/explainer/{episodeId}downloadorboth: Also save the script file.- Create
.listenhub/explainer/YYYY-MM-DD-{episodeId}/ - Write
{episodeId}.mdfrom the generated script content - Present the download path in addition to the above summary.
- Create
-
If video requested:
POST /storybook/episodes/{episodeId}/video(foreground) → poll again (background) using the exact bash command below withrun_in_background: trueandtimeout: 600000. Poll forvideoStatus, notprocessStatus:EPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.videoStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:
Present result
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Display video URL and audio URL as clickable links.
Present:
解说视频已生成!
视频链接:{videoUrl}
音频链接:{audioUrl}
时长:{duration}s
消耗积分:{credits}
download or both: Also download the audio file.
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/explainer/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"
Present the download path in addition to the above summary.
After Successful Generation
Update config with the choices made this session:
NEW_CONFIG=$(echo "$CONFIG" | jq \
--arg lang "{language}" \
--arg style "{info/story}" \
--arg speakerId "{speakerId}" \
'. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
Estimated times:
- Text script only: 2-3 minutes
- Text + Video: 3-5 minutes
API Reference
- Speaker list:
shared/api-speakers.md - Speaker selection guide:
shared/speaker-selection.md - Episode creation:
shared/api-storybook.md - Polling:
shared/common-patterns.md§ Async Polling - Config pattern:
shared/config-pattern.md
Composability
- Invokes: speakers API (for speaker selection); may invoke
/speechfor voiceover - Invoked by: content-planner (Phase 3)
Example
User: "Create an explainer video introducing Claude Code"
Agent workflow:
- Topic: "Claude Code introduction"
- Ask language → "English"
- Ask style → "Info"
- Fetch speakers, user picks "cozy-man-english"
- Ask output → "Text + Video"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}],
"speakers": [{"speakerId": "cozy-man-english"}],
"language": "en",
"mode": "info"
}'
Poll until text is ready, then generate video if requested.