explainer

When to Use

User wants to create an explainer or tutorial video
User asks to "explain" something in video form
User wants narrated content with AI-generated visuals
User says "explainer video", "解说视频", "tutorial video"

When NOT to Use

User wants audio-only content without visuals (use /speech or /podcast)
User wants a podcast-style discussion (use /podcast)
User wants to generate a standalone image (use /image-gen)
User wants to read text aloud without video (use /speech)

Purpose

Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.

Hard Constraints

Always read config following shared/config-pattern.md before any interaction
Follow shared/cli-patterns.md for execution modes, error handling, and interaction patterns
Always follow shared/cli-authentication.md for auth checks
Never hardcode speaker IDs — always fetch from the speakers CLI when the user wants to change voice
Never save files to ~/Downloads/ or .listenhub/ — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)
Explainer uses exactly 1 speaker
Mode must be info (for Info style) or story (for Story style) — never slides (use /slides skill instead)

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. </HARD-GATE>

Step -1: CLI Auth Check

Follow shared/config-pattern.md § CLI Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login per shared/cli-authentication.md — never ask the user to run commands manually.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/explainer"
echo '{"outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"
CONFIG_PATH=".listenhub/explainer/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/explainer/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/explainer/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (explainer)：
  输出方式：{inline / download / both}
  语言偏好：{zh / en / 未设置}
  默认风格：{info / story / 未设置}
  默认主播：{speakerName / 使用内置默认}

Then ask:

outputMode: Follow shared/output-mode.md § Setup Flow Question.
Language (optional): "默认语言？"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep null
Style (optional): "默认风格？"
- "Info — 信息展示型"
- "Story — 故事叙述型"
- "每次手动选择" → keep null

After collecting answers, save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: Topic / Content

Free text input. Ask the user:

What would you like to explain or introduce?

Accept: topic description, text content, or concept to explain.

Step 2: Language

If config.language is set, pre-fill and show in summary — skip this question. Otherwise ask:

Question: "What language?"
Options:
  - "Chinese (zh)" — Content in Mandarin Chinese
  - "English (en)" — Content in English
  - "Japanese (ja)" — Content in Japanese

Step 3: Style

If config.defaultStyle is set, pre-fill and show in summary — skip this question. Otherwise ask:

Question: "What style of explainer?"
Options:
  - "Info" — Informational, factual presentation style
  - "Story" — Narrative, storytelling approach

Step 4: Speaker Selection

Follow shared/speaker-selection.md:

If config.defaultSpeakers.{language} is set → use saved speaker silently
If not set → use built-in default from shared/speaker-selection.md for the language
Show the speaker in the confirmation summary (Step 6) — user can change from there if desired
Only show the full speaker list if the user explicitly asks to change voice

Speaker query: see shared/cli-speakers.md for listing and filtering speakers.

Only 1 speaker is supported for explainer videos.

Step 5: Output Type

Question: "What output do you want?"
Options:
  - "Text script only" — Generate narration script, no video
  - "Text + Video" — Generate full explainer video with AI visuals

Step 6: Confirm & Generate

Summarize all choices:

Ready to generate explainer:

  Topic: {topic}
  Language: {language}
  Style: {info/story}
  Speaker: {speaker name}
  Output: {text only / text + video}

  Proceed?

Wait for explicit confirmation before running any CLI command.

Workflow

Run the CLI command with run_in_background: true and timeout: 660000. The CLI blocks until generation completes and returns the final result as JSON:

listenhub explainer create \
  --query "{topic}" \
  --mode {info|story} \
  --lang {en|zh|ja} \
  --speaker "{name}" \
  --speaker-id "{id}" \
  --timeout 600 \
  --json

If the command fails (non-zero exit), check stderr for error details. See shared/cli-patterns.md § Error Handling for exit codes and common errors.

Optional flags (add when applicable):

--source-url "{url}" — if the user provided a reference URL
--skip-audio — if text-only output (no video)
--image-size {2K|4K} — image resolution (default: 2K)
--aspect-ratio {16:9|9:16|1:1} — video aspect ratio (default: 16:9)
--style "{style}" — visual style for AI-generated images

Tell the user the task is submitted. When notified of completion, parse and present result:

Parse the CLI JSON output for key fields:

EPISODE_ID=$(echo "$RESULT" | jq -r '.episodeId')
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty')
VIDEO_URL=$(echo "$RESULT" | jq -r '.videoUrl // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

If text-only output:

inline or both: Present the script inline.

Present:

解说脚本已生成！

「{title}」

在线查看：https://listenhub.ai/app/explainer/{episodeId}

download or both: Also save the script file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

Save as {slug}-explainer.md in cwd (dedup if exists)
Present the save path in addition to the above summary.

If text + video output:

inline or both: Display video URL and audio URL as clickable links.

Present:

解说视频已生成！

视频链接：{videoUrl}
音频链接：{audioUrl}
消耗积分：{credits}

download or both: Also save files. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

Create {slug}-explainer/ folder (dedup if exists)
Write script.md inside

Download audio:

listenhub download "{audioUrl}" -o "{slug}-explainer/audio.mp3"

Present:

已保存到当前目录：
  {slug}-explainer/
    script.md
    audio.mp3

After Successful Generation

Update config with the choices made this session:

NEW_CONFIG=$(echo "$CONFIG" | jq \
  --arg lang "{language}" \
  --arg style "{info/story}" \
  --arg speakerId "{speakerId}" \
  '. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"

Estimated times:

Text script only: 2-3 minutes
Text + Video: 5-10 minutes

Resources

CLI authentication: shared/cli-authentication.md
CLI patterns: shared/cli-patterns.md
Speaker query: shared/cli-speakers.md
Speaker selection guide: shared/speaker-selection.md
Config pattern: shared/config-pattern.md
Output mode: shared/output-mode.md

Composability

Invokes: speakers CLI (for speaker selection); may invoke /speech for voiceover
Invoked by: content-planner (Phase 3)

Example

User: "Create an explainer video introducing Claude Code"

Agent workflow:

Topic: "Claude Code introduction"
Ask language → "English"
Ask style → "Info"
Use default speaker "Mars" (cozy-man-english)
Ask output → "Text + Video"

# Run with run_in_background: true, timeout: 660000
listenhub explainer create \
  --query "Introduce Claude Code: what it is, key features, and how to get started" \
  --mode info \
  --lang en \
  --speaker "Mars" \
  --speaker-id "cozy-man-english" \
  --timeout 600 \
  --json

Parse result for episodeId, audioUrl, videoUrl, credits, and present to user.

Safety Notice

Copy this and send it to your AI assistant to learn

When to Use

When NOT to Use

Purpose

Hard Constraints

Step -1: CLI Auth Check

Step 0: Config Setup

Setup Flow (user-initiated reconfigure only)

Interaction Flow

Step 1: Topic / Content

Step 2: Language

Step 3: Style

Step 4: Speaker Selection

Step 5: Output Type

Step 6: Confirm & Generate

Workflow

After Successful Generation

Resources

Composability

Example

Source Transparency

Related Skills

listenhub

tts

asr

podcast