invoking-gemini

Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "invoking-gemini" with this command: npx skills add oaustegard/claude-skills/oaustegard-claude-skills-invoking-gemini

Invoking Gemini

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

When to Use Gemini

Structured outputs:

  • JSON Schema validation with property ordering guarantees
  • Pydantic model compliance
  • Strict schema adherence (enum values, required fields)

Cost optimization:

  • Parallel batch processing (Gemini 3 Flash is lightweight)
  • High-volume simple tasks
  • Budget-constrained operations

Google ecosystem:

  • Integration with Google services
  • Vertex AI workflows
  • Google-specific APIs

Multi-modal tasks:

  • Image analysis with JSON output
  • Video processing
  • Audio transcription with structure

Available Models

All Gemini 3 models are currently in preview. Use only these — no Gemini 2.x.

Text / Reasoning Models

gemini-3-flash-preview (Default / Recommended):

  • Gemini 3 Flash: Pro-level intelligence at Flash speed and pricing
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $0.50 input / $3.00 output per 1M tokens
  • Alias: flash

gemini-3.1-pro-preview:

  • Gemini 3.1 Pro: Best for complex tasks requiring broad world knowledge and advanced reasoning across modalities
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $2.00 / $12.00 per 1M tokens (<200k tokens); $4.00 / $18.00 (>200k tokens)
  • Alias: pro

gemini-3.1-flash-lite-preview:

  • Gemini 3.1 Flash-Lite: Workhorse model for cost-efficiency and high-volume tasks
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $0.25 (text, image, video), $0.50 (audio) input / $1.50 output per 1M tokens
  • Alias: lite

Image Generation Models

nano-banana-2 (Default image model):

  • Gemini 3.1 Flash Image — high-volume, high-efficiency image generation
  • API model: gemini-3.1-flash-image-preview
  • 128k input context, 32k output
  • $0.25 per 1M text input tokens / $0.067 per image output
  • Alias: image

nano-banana-pro:

  • Gemini 3 Pro Image — highest quality image generation with text rendering and multi-turn editing
  • API model: gemini-3-pro-image-preview
  • 65k input context, 32k output
  • $2.00 per 1M text input tokens / $0.134 per image output
  • Alias: image-pro

See references/models.md for full model details and pricing.

Setup

Prerequisites:

uv pip install requests pydantic
# google-generativeai only needed for direct API fallback:
# uv pip install google-generativeai

Credentials — Option A (recommended): Cloudflare AI Gateway

Requests are routed through Cloudflare AI Gateway, bypassing IP blocks and gaining caching, analytics, and rate limiting.

Create /mnt/project/proxy.env:

CF_ACCOUNT_ID=<your-cloudflare-account-id>
CF_GATEWAY_ID=<your-gateway-name>
CF_API_TOKEN=<your-cf-api-token>
# GOOGLE_API_KEY only needed if not using Cloudflare BYOK:
# GOOGLE_API_KEY=AIzaSy...
  • Get your Cloudflare Account ID: Cloudflare dashboard → right sidebar
  • Create a gateway: Cloudflare dashboard → AI Gateway → Create gateway
  • Generate an API token: https://dash.cloudflare.com/profile/api-tokens
  • Store your Gemini key in the gateway (BYOK): AI Gateway → your gateway → API Keys

Credentials — Option B: Direct Google API (fallback)

If no proxy.env is found, the client falls back to direct Google API access:

Basic Usage

Import the client:

import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

# Simple prompt
response = invoke_gemini(
    prompt="Explain quantum computing in 3 bullet points",
    model="gemini-3-flash-preview"
)
print(response)

Structured Output

Use Pydantic models for guaranteed JSON Schema compliance:

from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)

# result is a BookAnalysis instance
print(result.title)  # "1984"
print(result.genre)  # "Dystopian Fiction"

Advantages over Claude:

  • Guaranteed property ordering in JSON
  • Strict enum enforcement
  • Native schema validation (no prompt engineering)
  • Lower cost for simple extractions

Parallel Invocation

Process multiple prompts concurrently:

from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-3-flash-preview"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")

Use cases:

  • Batch classification tasks
  • Data labeling
  • Multiple independent analyses
  • A/B testing prompts

Error Handling

The client handles common errors:

from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-3-flash-preview"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key

Common issues:

  • Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
  • Invalid model → Raises ValueError
  • Rate limit → Automatically retries with backoff
  • Network error → Returns None after retries

Advanced Features

Custom Generation Config

response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-3-flash-preview",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

Multi-modal Input

# Image analysis with structured output
from pydantic import BaseModel

class ImageDescription(BaseModel):
    objects: list[str]
    scene: str
    colors: list[str]

result = invoke_with_structured_output(
    prompt="Describe this image",
    pydantic_model=ImageDescription,
    image_path="/mnt/user-data/uploads/photo.jpg"
)

See references/advanced.md for more patterns.

Image Generation

Generate images using Gemini's native image models:

from gemini_client import generate_image

# Basic generation
result = generate_image("A watercolor painting of a mountain lake at sunset")
print(result["path"])     # /mnt/user-data/outputs/gemini_image_1740000000.png
print(result["caption"])  # Optional text the model returns alongside the image

Model Selection

# Fast generation (default) — nano-banana-2 → gemini-3.1-flash-image-preview
result = generate_image("A red bicycle", model="nano-banana-2")

# High-fidelity — nano-banana-pro → gemini-3-pro-image-preview
result = generate_image("A red bicycle", model="image-pro")

Custom Output Path

result = generate_image(
    "A logo for a coffee shop called 'Bean There'",
    output_path="/mnt/user-data/outputs/coffee_logo.png"
)

Effective Prompt Patterns

  • Be specific about style: "A watercolor painting of..." vs "A picture of..."
  • Include composition details: "centered, wide angle, high contrast"
  • Specify text rendering: "A poster with the text 'SALE' in bold red letters"
  • Multi-turn editing: Generate once, then refine with follow-up prompts

Return Value

{
    "path": "/mnt/user-data/outputs/gemini_image_1740000000.png",
    "caption": "Optional descriptive text from the model"  # or None
}

Returns None on failure (credentials missing, API error, no image in response).

Comparison: Gemini vs Claude

Use Gemini when:

  • Structured output is primary goal
  • Cost is a constraint
  • Property ordering matters
  • Batch processing many simple tasks

Use Claude when:

  • Complex reasoning required
  • Long context needed (200K tokens)
  • Code generation quality matters
  • Nuanced instruction following

Use both:

  • Claude for planning/reasoning
  • Gemini for structured extraction
  • Parallel workflows with different strengths

Token Efficiency Pattern

Gemini 3 Flash is cost-effective for sub-tasks:

# Claude (you) plans the approach
# Gemini executes structured extractions

data_points = []
for file in uploaded_files:
    # Gemini extracts structured data
    result = invoke_with_structured_output(
        prompt=f"Extract contact info from {file}",
        pydantic_model=ContactInfo
    )
    data_points.append(result)

# Claude synthesizes results
# ... your analysis here ...

Limitations

Not suitable for:

  • Tasks requiring deep reasoning
  • Long context (>1M tokens)
  • Complex code generation
  • Subjective creative writing

Token limits:

  • gemini-3-flash-preview: ~1M input, 64k output
  • gemini-3.1-pro-preview: ~1M input, 64k output (2x pricing above 200k)
  • gemini-3.1-flash-lite-preview: ~1M input, 64k output

Rate limits:

  • Vary by API tier
  • Client handles automatic retry

Examples

See references/examples.md for:

  • Data extraction from documents
  • Batch classification
  • Multi-modal analysis
  • Hybrid Claude+Gemini workflows

Troubleshooting

"No credentials configured":

  • Create /mnt/project/proxy.env with CF_ACCOUNT_ID, CF_GATEWAY_ID, CF_API_TOKEN
  • Or add GOOGLE_API_KEY.txt for direct API access
  • See Setup section above for details

CF Gateway 401/403:

  • Verify your CF_API_TOKEN has AI Gateway permissions
  • Check that gateway authentication is enabled in the Cloudflare dashboard
  • If not using BYOK, add GOOGLE_API_KEY to proxy.env

CF Gateway 429 (rate limited):

  • The client automatically retries with exponential backoff
  • Check your gateway's rate limit settings in Cloudflare dashboard

Import errors:

uv pip install requests pydantic
# For direct API fallback only:
uv pip install google-generativeai

Schema validation failures:

  • Check Pydantic model definitions
  • Ensure prompt is clear about expected structure
  • Add examples to prompt if needed

Cost Comparison

All Gemini 3 models (preview, Jan 2025 cutoff):

ModelInput / 1M tokensOutput / 1M tokens
Gemini 3 Flash (gemini-3-flash-preview)$0.50$3.00
Gemini 3.1 Pro (gemini-3.1-pro-preview)$2.00 (<200k) / $4.00 (>200k)$12.00 / $18.00
Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite-preview)$0.25 text/image/video, $0.50 audio$1.50
Nano Banana 2 / 3.1 Flash Image (gemini-3.1-flash-image-preview)$0.25 text$0.067 per image
Nano Banana Pro / 3 Pro Image (gemini-3-pro-image-preview)$2.00 text$0.134 per image

Strategy: Use Flash-Lite for high-volume simple tasks, 3 Flash for balanced performance, 3.1 Pro for complex reasoning.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

crafting-instructions

No summary provided by upstream source.

Repository SourceNeeds Review
General

fetching-blocked-urls

No summary provided by upstream source.

Repository SourceNeeds Review
General

remembering

No summary provided by upstream source.

Repository SourceNeeds Review
General

browsing-bluesky

No summary provided by upstream source.

Repository SourceNeeds Review