agent-right-brain

Give agents creative abilities using `rawgenai` — speak, listen, generate images/videos/music/sound effects, create multi-speaker dialogue, and manage voices. Use this skill when the user asks to "speak", "talk", "read aloud", "transcribe", "generate an image", "create a picture", "draw", "edit an image", "generate a video", "create a video", "animate", "generate music", "create a song", "generate sound effects", "create dialogue", "design a voice", "clone a voice", or any request involving voice, audio, image, or video creation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-right-brain" with this command: npx skills add whq25/rawgenai/whq25-rawgenai-agent-right-brain

Agent Right Brain

Use rawgenai <provider> <action> to give agents creative abilities. Always read the chosen provider's reference file before running commands.

Prerequisites

brew install WHQ25/tap/rawgenai

Before using a provider, read its setup guide at references/setup/ to configure credentials.

Input Sources (All Capabilities)

  1. Positional argument: rawgenai <provider> <action> "text" [flags]
  2. File: rawgenai <provider> <action> --file input.txt [flags]
  3. Stdin: echo "text" | rawgenai <provider> <action> [flags]

General Guidelines

  • On first use of a capability, ask user to pick a provider. Remember for the session.
  • All output is JSON. Always show file paths to the user.
  • For async commands (video, some image/audio): create -> status -> download.
  • If a command fails, try a different provider or inform the user.
  • Write image/video prompts descriptively: subject + action + environment + style + lighting.
  • For TTS: write natural conversational text, not markdown. Use --speak for playback, -o for file.

Speak (TTS)

rawgenai <provider> tts "<text>" --speak

ProviderCommandBest ForReference
OpenAIrawgenai openai ttsGeneral purpose, Englishref
Google Geminirawgenai google ttsExpressive storytelling, multi-speakerref
ElevenLabsrawgenai elevenlabs ttsMost natural voices, 70+ languagesref
Seedrawgenai seed ttsChinese, emotion-richref
DashScoperawgenai dashscope ttsChinese, 10 languages, 49 voicesref
MiniMaxrawgenai minimax ttsChinese, streamingref
Klingrawgenai kling ttsBilingual zh/enref
Runwayrawgenai runway audio ttsAsync

Listen (STT)

rawgenai <provider> stt <audio-file>

ProviderCommandBest ForReference
OpenAIrawgenai openai sttSubtitles (srt/vtt)ref
Google Geminirawgenai google sttSpeaker diarizationref
ElevenLabsrawgenai elevenlabs sttLarge files (3GB), video inputref
DashScoperawgenai dashscope sttChinese, emotion, long audio (12h async)ref

Image

rawgenai <provider> image "<prompt>" -o output.png

ProviderCommandBest ForReference
OpenAIrawgenai openai imageTransparent bg, editing, multi-turnref
Google Geminirawgenai google image4K, text in imageref
Grokrawgenai grok imageBatch (up to 10)ref
Seedrawgenai seed image4K, multi-image fusionref
DashScoperawgenai dashscope imageText rendering, Chineseref
MiniMaxrawgenai minimax imageSubject referenceref
Klingrawgenai kling imageFace reference (async)ref
Lumarawgenai luma imageCreative, reframe (async)
Hunyuanrawgenai hunyuan imageChinese (async)
Runwayrawgenai runway imageCinematic (async)

Video

rawgenai <provider> video create "<prompt>" [flags]status <id>download <id> -o out.mp4

ProviderCommandBest ForReference
OpenAI (Sora)rawgenai openai videoRemixref
Google (Veo)rawgenai google video4K, extensionref
Grokrawgenai grok videoQuick, editingref
Seedrawgenai seed videoAudio, wide ratiosref
DashScoperawgenai dashscope videoCharacter ref, multi-shotref
MiniMax (Hailuo)rawgenai minimax videoSubject ref, director modesref
Klingrawgenai kling videoMost advanced, element systemref
Lumarawgenai luma videoExtension, upscale
Hunyuanrawgenai hunyuan videoChinese
Runwayrawgenai runway videoCinematic, character ref

Music

ProviderCommandBest ForReference
ElevenLabsrawgenai elevenlabs musicPrompt-based, composition plansref
MiniMaxrawgenai minimax music createLyrics-to-music, Chineseref

Sound Effects (SFX)

ProviderCommandReference
ElevenLabsrawgenai elevenlabs sfx "<prompt>" -o out.mp3ref
Runwayrawgenai runway audio sfx "<prompt>"

Dialogue

Multi-speaker dialogue from JSON script (max 10 voices).

ProviderCommandReference
ElevenLabsrawgenai elevenlabs dialogue -i script.json -o out.mp3ref

Voice Management

Design, clone, and manage custom voices.

ProviderCommandCapabilitiesReference
ElevenLabsrawgenai elevenlabs voicelist, design, create, previewref
Klingrawgenai kling voicecreate, status, list, deleteref
MiniMaxrawgenai minimax voicelist, upload, clone, design, deleteref
Seedrawgenai seed voice-cloneupload, status, order, renewref

Audio Processing

Async: rawgenai runway audio <action>status <id>download <id> -o out

ProviderCommandCapability
Runwayrawgenai runway audio stsSpeech-to-speech (voice conversion)
Runwayrawgenai runway audio dubbingDub audio to another language
Runwayrawgenai runway audio isolationIsolate voice from background

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agent-canvas

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-memory

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

M365 (Microsoft) Task Manager by altf1be

Manage lightweight Microsoft 365 task workflows with Microsoft To Do and Planner. Use when a user needs to quickly create, assign, track, and follow up opera...

Registry SourceRecently Updated
Automation

Self Improving Agent

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...

Registry SourceRecently Updated