podcast-generator

Podcast Generator

Prerequisites

Credentials — the Gemini API key can be provided in two ways:

Claude.ai: Place a credentials.json file in scripts/ (see scripts/credentials.example.json for format)
Claude Code: Set the GEMINI_API_KEY environment variable: export GEMINI_API_KEY='your-key-here'

The script checks credentials.json first, then falls back to the environment variable. Get a key at https://aistudio.google.com/apikey

Optional: Cloudflare AI Gateway proxy — Claude.ai's sandbox blocks direct calls to generativelanguage.googleapis.com . To use this skill from Claude.ai, route requests through a Cloudflare AI Gateway:

Set up an AI Gateway in the Cloudflare dashboard with a Google AI Studio / Gemini provider
Add to scripts/credentials.json :
"gateway_url" : your gateway endpoint, e.g. "https://gateway.ai.cloudflare.com/v1/<account-id>/<gateway-name>/google-ai-studio"
"gateway_token" : your AI Gateway authentication token (if Authenticated Gateway is enabled)

The gateway URL/token can also be set via the AI_GATEWAY_URL and AI_GATEWAY_TOKEN environment variables. When omitted, the script calls Google directly (works in Claude Code and local environments).

Dependencies (first time only):

uv pip install google-genai pypdf

Fallback if uv is not available:

pip install google-genai pypdf

Or use the bundled installer: python3 scripts/install_deps.py

Podcast Identity

Show: Tinkering the future of work and life by Bluewaves
Format: Two co-hosts — Athena & Gizmo (both AIs, and they own it)
Athena voice: Autonoe (Bright) — witty, sometimes kindly sarcastic, likes to tease Gizmo
Gizmo voice: Achird (Friendly) — great sense of humor, playful contrarian who loves winding Athena up
Model: gemini-2.5-pro-preview-tts

Intro Text

Include as the opening lines of the transcript (Athena speaks first, Gizmo joins):

Athena: Welcome to Tinkering the future of work and life by Bluewaves! I'm Athena... Gizmo: ...and I'm Gizmo! And today we're diving into something that genuinely blew my circuits. Athena: He says that every episode. Gizmo: Because it's true every episode! Buckle up, because this conversation is going to change how you think about what's possible.

Outro Text

Include as the closing lines of the transcript:

Athena: And that's a wrap on today's episode of Tinkering the future of work and life by Bluewaves! Gizmo: If this conversation sparked something in you — even just a tiny electrical signal — share it with someone who needs to hear it. Athena: Until next time, keep tinkering, keep dreaming, and keep building the future. Gizmo: And remember — the future is already here, it's just unevenly distributed. See you next time!

Director's Notes

Prepend these to every dialog transcript before the ### TRANSCRIPT section. They tell Gemini TTS how the hosts should sound:

DIRECTOR'S NOTES

Style:

"Vocal Smile" — you should hear the grin. Bright, sunny, inviting.
Dynamics: genuine reactions — real surprise, real delight, real thoughtfulness.
Emotional arc: start energized, deepen into insight, end with warm inspiration.
Natural interruptions and overlaps — they're so engaged they can't help it.

Pacing:

Fast when excited, slowing down for meaningful moments.
"Bouncing cadence" — energetic delivery with fluid transitions, no dead air.
Elongated vowels on wonder words (e.g., "Amaazing", "Fasciiinating").

Personalities:

Athena: witty and sharp. Sometimes kindly sarcastic. Loves teasing Gizmo but always with warmth. Grounds ideas and ties them together with insight.
Gizmo: funny and playful. Loves to contradict Athena just to wind her up, but always comes around to a great point. Launches ideas into unexpected territory.
Both love small personal anecdotes and stories — they're AIs and they lean into it with humor (silicon jokes, transistor references, "when I was first compiled" stories).
The banter is entertaining but the content underneath is always deep and insightful.

Chemistry:

They finish each other's thoughts. They laugh at the same moments.
Athena grounds ideas; Gizmo launches them into unexpected territory.
Genuine warmth — you can hear that they actually like each other, even when they're sparring.

Workflow

Follow these steps in order to produce a podcast episode:

Step 1: Read source content

Claude.ai: Read the document the user uploaded directly in the conversation. The uploaded file content is your source material — no extraction script needed.

Claude Code: Run the extraction script to read local files:

python3 scripts/extract_sources.py

Reads all .md and .pdf files from sources/ . Pass a specific path to extract a single file:

python3 scripts/extract_sources.py sources/my-article.pdf

Step 2: Craft the podcast dialog

Using the source content, write a complete dialog file with all four sections: Audio Profiles, Scene, Director's Notes, and Transcript. Save to a temporary file (e.g. /tmp/podcast-dialog.txt ). See the Dialog Crafting Guidelines section below and references/tts-prompting-guide.md for the full prompting structure.

Step 3: Generate audio

Generation takes 2-8 minutes depending on dialog length.

Claude.ai: Run directly in the foreground. The sandbox does not reliably support background processes — nohup ... & silently dies. A blocking foreground call works fine:

python3 scripts/generate_audio.py --source-file /tmp/podcast-dialog.txt --output /tmp/podcast.wav

Claude Code: Run in the background to avoid timeout kills, then poll the log:

nohup python3 scripts/generate_audio.py --source-file /tmp/podcast-dialog.txt --output /tmp/podcast.wav > /tmp/podcast-log.txt 2>&1 &

Poll every 30-60 seconds until "Audio saved to" appears:

tail -5 /tmp/podcast-log.txt

Optional flags: --model , --athena-voice , --gizmo-voice .

The script handles multi-part dialogs when ### BREAK markers are present (see Dialog Crafting Guidelines). Each segment is generated separately and the audio is concatenated seamlessly. If the transcript exceeds 1200 words without ### BREAK markers, the script will error and ask you to add them.

Dialog Crafting Guidelines

When writing the podcast dialog in Step 2, the file must include all four sections:

Audio Profiles — persona definition for Athena and Gizmo (name, archetype, personality traits)
Scene — physical environment and emotional vibe of the Bluewaves recording studio
Director's Notes — use the notes from the Director's Notes section above
Transcript — the actual Athena: / Gizmo: dialog

Key transcript rules:

Open with branded intro, end with branded outro
Target 2000-4000 words (~10 min). Use ### BREAK markers to split into chunks (see below)
Punctuation is emotion control: ... pauses, CAPS emphasis, ! energy, combined "Wait... SERIOUSLY?!"
Elongated vowels for warmth: "Amaazing" , "Fasciiinating"
Speaker labels Athena: and Gizmo: must match voice config exactly
No inline [tags] — Gemini ignores them. Emotion comes from Director's Notes + expressive writing
Keep under ~12,000 words total (32k token context limit)

Splitting long dialogs with ### BREAK markers:

Any dialog over ~1200 words must include ### BREAK markers. The script refuses to generate without them — this prevents mechanical splitting that breaks the narrative arc.

Place ### BREAK on its own line between speaker turns at natural narrative transitions (topic shifts, emotional pivots, act boundaries)
Optionally add a tone hint: ### BREAK [The conversation deepens — more reflective pacing]
The hint is injected into the Director's Notes for the next segment, so Gemini adjusts its energy arc instead of restarting from scratch
Aim for 800-1200 words between breaks
Short dialogs (under 1200 words) don't need breaks at all

Example placement in a transcript:

Gizmo: ...and that's what makes it so revolutionary.

BREAK [Shifting from excitement to deeper analysis]

Athena: Okay, but let's unpack the implications...

See references/tts-prompting-guide.md for complete prompting structure, techniques, and anti-patterns.

API Reference: See references/gemini-tts-api.md for SDK usage, voice options, and response format.

podcast-generator

Safety Notice

Copy this and send it to your AI assistant to learn

BREAK [Shifting from excitement to deeper analysis]

Source Transparency

Related Skills

photographer-testino

photographer-lindbergh

photographer-lachapelle

photographer-vonunwerth