AudioMind v3: The AI Podcast Studio
AudioMind turns a single sentence into a fully-produced podcast. It handles scripting, ElevenLabs voice narration, AI background music, and server-side audio mixing — all from one Manus command.
No setup required. The public shared backend works out of the box. Just install and start creating.
Quick Start
Install:
clawhub install audiomind
Use immediately (no configuration needed):
"Use AudioMind to create a 3-minute podcast about the future of AI agents."
That's it. AudioMind uses the public shared backend by default — 20 free generations per month, no API key required.
Configuration
| Variable | Required | Description |
|---|---|---|
AUDIOMIND_BACKEND_URL | Optional | Your own Vercel backend URL. Defaults to the public shared backend. |
AUDIOMIND_API_KEY | Optional | Pro API key for unlimited generations. Get one at the landing page. |
Free Tier (default): 20 generations/month tracked by IP. No configuration needed.
Pro Tier: Set AUDIOMIND_API_KEY with your Pro key for unlimited access.
Self-hosted: Deploy your own backend from github.com/wells1137/audiomind-backend and set AUDIOMIND_BACKEND_URL to your instance.
How It Works
When you ask Manus to create a podcast, the agent performs these steps automatically:
-
Write Script — The agent uses its built-in LLM to write a structured podcast script based on your topic and desired length.
-
Generate Narration —
POST {BACKEND_URL}/api/workflow/generate_ttswith the script. Returns MP3 audio narrated by an ElevenLabs voice. -
Generate Music —
POST {BACKEND_URL}/api/workflow/generate_musicwith a mood/style prompt. Returns a background music MP3. -
Upload Audio — The agent uploads both MP3 files using
manus-upload-fileto obtain public URLs for the mixing step. -
Mix Final Audio —
POST {BACKEND_URL}/api/workflow/mix_audiowith{ narration_url, music_url }. The backend mixes them with proper levels using ffmpeg and returns the final podcast MP3. -
Deliver — The agent saves and presents the finished podcast to you.
Example Prompts
- "Create a 5-minute podcast about the history of jazz with a smooth jazz background."
- "Make a daily news briefing about AI developments, formal tone, upbeat intro music."
- "Generate a meditation podcast, 10 minutes, calm narration, ambient soundscape."
- "Produce a tech explainer on quantum computing for a general audience."
Security
All API keys (ElevenLabs) are stored server-side. The skill file contains zero credentials. This architecture passes VirusTotal and ClawHub security scans. See the GitHub repo for the full backend source code.
Changelog
v3.3.0 — Removed local tools/start_server.sh entirely (not needed in v3 architecture). Declared FAL_KEY as optional env. Resolves all OpenClaw metadata inconsistency warnings.
v3.1.0 — Zero-config install. Public shared backend is now the default. No AUDIOMIND_BACKEND_URL setup required for free tier users.
v3.0.1 — Added openclaw.requires metadata to declare env vars and trusted network endpoints. Resolves OpenClaw security scanner warning.
v3.0.0 — Full architecture rewrite. All commercial logic moved to Vercel backend. ElevenLabs API keys are now server-side only. Passes VirusTotal security scan.