speech-build

Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "speech-build" with this command: npx skills add cnemri/google-genai-skills/cnemri-google-genai-skills-speech-build

Speech Skill (TTS & STT)

Use this skill to implement audio generation and transcription workflows using the google-genai and google-cloud-speech SDKs.

Quick Start Setup

from google import genai
from google.genai import types
# For STT: from google.cloud import speech_v2

client = genai.Client()

Reference Materials

Common Workflows

1. Generate Speech (Gemini-TTS)

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)

2. Transcribe Audio (Chirp 3)

# Requires google-cloud-speech
from google.cloud import speech_v2
# ... (See stt.md for full setup)
response = speech_client.recognize(...)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

veo-use

No summary provided by upstream source.

Repository SourceNeeds Review
General

nano-banana-use

No summary provided by upstream source.

Repository SourceNeeds Review
General

veo-build

No summary provided by upstream source.

Repository SourceNeeds Review