Gemini Imagen

Overview

This skill enables image generation from text prompts using Google's Gemini Imagen API. It provides a reusable script that handles API authentication, request formatting, response processing, and automatic image saving with proper error handling.

When to Use This Skill

Use this skill when the user requests:

Creating or generating images from text descriptions
Visualizing concepts, scenes, or objects through AI-generated imagery
Producing multiple variations of an image concept
Creating images with specific aspect ratios or quality levels

Example requests:

"Generate an image of a sunset over mountains"
"Create a logo concept showing a geometric bird"
"Make me an image of a futuristic city at night in 16:9 ratio"
"Generate 3 variations of a robot painting artwork"

Configuration

API Key Setup

The Gemini API requires an API key for authentication. Obtain a key from Google AI Studio.

Recommended approach: Store the API key as an environment variable:

export GEMINI_API_KEY="your-api-key-here"

Alternatively, pass the key directly when invoking the script (less secure for shared environments).

Python Dependencies

The script requires these Python packages:

requests
HTTP client for API calls
Pillow
Image processing library

These are included in the project's shared virtual environment. Activate it before running:

source .venv/bin/activate # On Windows: .venv\Scripts\activate

Generating Images

Basic Usage

To generate a single image with default settings:

python scripts/generate_image.py "your prompt here" --api-key $GEMINI_API_KEY

The script will:

Send the prompt to the Gemini Imagen API
Receive and decode the generated image(s)
Save images with timestamped filenames (e.g., gemini_image_20231123_142530_1.png )
Display progress and file paths

Advanced Options

Model Selection

Choose from three quality/speed tiers:

Fast generation (default) - quickest, good quality

--model imagen-4.0-fast-generate-001

Standard generation - balanced speed and quality

--model imagen-4.0-generate-001

Ultra generation - highest quality, slower

--model imagen-4.0-ultra-generate-001

Aspect Ratios

Generate images in different dimensions:

Square (default)

--aspect-ratio 1:1

Portrait orientations

--aspect-ratio 3:4 --aspect-ratio 9:16

Landscape orientations

--aspect-ratio 4:3 --aspect-ratio 16:9

Multiple Images

Generate up to 4 variations in a single request:

--num 4

Output Directory

Specify where to save generated images:

--output ./generated_images

Complete Examples

Generate a high-quality landscape image:

python scripts/generate_image.py
"Majestic mountain range at golden hour with dramatic clouds"
--api-key $GEMINI_API_KEY
--model imagen-4.0-ultra-generate-001
--aspect-ratio 16:9
--output ./landscapes

Create multiple logo variations:

python scripts/generate_image.py
"Minimalist geometric logo for tech startup, blue and white"
--api-key $GEMINI_API_KEY
--num 4
--aspect-ratio 1:1
--output ./logo_concepts

Quick social media graphic:

python scripts/generate_image.py
"Abstract colorful pattern for social media background"
--api-key $GEMINI_API_KEY
--aspect-ratio 9:16
--output ./social_media

Workflow Integration

When a user requests image generation:

Extract the prompt from the user's request
Determine parameters based on context:
Aspect ratio (square for logos, 16:9 for presentations, etc.)
Number of variations (if user wants options)
Quality tier (ultra for final outputs, fast for iteration)
Invoke the script with appropriate parameters
Show the generated images to the user and provide file paths
Iterate if needed with refined prompts or different parameters

Best Practices

Prompt Engineering

Be specific and descriptive: Include details about style, lighting, composition, colors
Specify art style if desired: "digital art", "oil painting", "photorealistic", "minimalist"
Mention important elements: Objects, subjects, background, atmosphere
Include quality keywords: "high detail", "professional", "award-winning"

Example good prompt:

"A serene Japanese garden with cherry blossoms in full bloom, koi pond in foreground, traditional stone lantern, soft morning light, photorealistic style, high detail"

Example basic prompt (works but less controlled):

"Japanese garden"

Model Selection

Fast model: Prototyping, iteration, quick previews, high-volume generation
Standard model: General-purpose images, balanced quality and speed
Ultra model: Final outputs, client presentations, high-stakes visuals

Error Handling

The script handles common errors:

Invalid API keys → Check API key configuration
Network timeouts → Verify internet connection, retry request
Rate limiting → Wait and retry, consider reducing simultaneous requests
Invalid parameters → Review model name, aspect ratio, and num_images values

Output Format

Generated images are saved as PNG files with:

Naming convention: gemini_image_YYYYMMDD_HHMMSS_N.png
Timestamp: Ensures unique filenames across runs
Sequential numbering: When generating multiple images
SynthID watermark: Automatically embedded by Imagen API

Resources

scripts/generate_image.py

The main image generation script that handles:

API authentication and request formatting
Base64 image decoding and PIL processing
Automatic file saving with timestamps
Comprehensive error handling and user feedback
Command-line interface with all customization options

Invoke directly from the command line or integrate into larger workflows.

gemini-imagen

Safety Notice

Copy this and send it to your AI assistant to learn

Fast generation (default) - quickest, good quality

Standard generation - balanced speed and quality

Ultra generation - highest quality, slower

Square (default)

Portrait orientations

Landscape orientations

Source Transparency

Related Skills

twilio-phone

deepgram-transcription

email-himalaya

skill-creator