gemini-imagegen

Gemini Image Generation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gemini-imagegen" with this command: npx skills add moosegoose0701/skill-compose/moosegoose0701-skill-compose-gemini-imagegen

Gemini Image Generation

Generate and edit images via Google Gemini's native multimodal image generation.

Model Selection

Model ID Codename Best for Max resolution

gemini-2.5-flash-image

Nano Banana Fast drafts, high-volume, low-latency 1K

gemini-3-pro-image-preview

Nano Banana Pro Studio-quality, text rendering, complex prompts 4K

Default: gemini-3-pro-image-preview (Pro) unless speed/cost is a concern.

Setup

Install (once)

pip install google-genai

from google import genai import os, base64

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

If GEMINI_API_KEY is missing, instruct the user to set it as an environment variable. Never ask the user to paste the key in chat.

Text-to-Image

from google import genai from google.genai import types import os

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents="A photorealistic cat on a rainbow sofa", config=types.GenerateContentConfig( response_modalities=["TEXT", "IMAGE"], ), )

Extract and save

for part in response.candidates[0].content.parts: if part.inline_data is not None: with open("output.png", "wb") as f: f.write(part.inline_data.data) break

Aspect Ratio

Set via image_config :

config=types.GenerateContentConfig( response_modalities=["TEXT", "IMAGE"], image_config=types.ImageConfig( aspect_ratio="16:9", # for slides / widescreen ), )

Supported ratios: 1:1 , 2:3 , 3:2 , 3:4 , 4:3 , 4:5 , 5:4 , 9:16 , 16:9 , 21:9

Common choices:

  • Slides / presentations → 16:9

  • Social media / portraits → 9:16 or 4:5

  • Square thumbnails → 1:1

Image Editing (with reference image)

from google.genai import types from pathlib import Path import base64

ref_bytes = Path("input.jpg").read_bytes()

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[ types.Part(inline_data=types.Blob(mime_type="image/jpeg", data=base64.b64encode(ref_bytes).decode())), types.Part(text="Remove the background and replace with a sunset gradient"), ], config=types.GenerateContentConfig( response_modalities=["TEXT", "IMAGE"], ), )

Pro supports up to 14 reference images for multi-image composition and up to 5 human reference images for character/identity consistency.

Batch Generation (for slides)

When generating multiple images (e.g. one per slide), loop sequentially and save with numbered filenames:

import os, time from google import genai from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

prompts = [...] # list of prompt strings

for i, prompt in enumerate(prompts, 1): response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=prompt, config=types.GenerateContentConfig( response_modalities=["TEXT", "IMAGE"], image_config=types.ImageConfig(aspect_ratio="16:9"), ), ) for part in response.candidates[0].content.parts: if part.inline_data is not None: with open(f"slide_{i}.png", "wb") as f: f.write(part.inline_data.data) break time.sleep(1) # rate limit courtesy

Error Handling

  • Safety filter block: The model may refuse prompts it deems unsafe. Adjust the prompt to be less ambiguous (remove violent/adult/medical imagery language) and retry.

  • Empty response: If response.candidates is empty or has no image parts, the prompt may be too vague. Add concrete scene details and retry.

  • Rate limit (429): Back off with exponential delay. Default: time.sleep(2 ** attempt) .

  • Timeout: Set a reasonable timeout; Pro model may take 10–30s for complex prompts.

Prompt Best Practices

  • Structure: scene → subject → style → composition → constraints

  • Always specify art style: "flat vector illustration", "watercolor painting", "3D render", "photorealistic photograph"

  • Include lighting and mood: "soft diffused lighting", "dramatic rim light", "golden hour"

  • For text in images: quote exact text, specify font style and placement

  • For slide illustrations: add "negative space on [side]" to leave room for text overlay

  • Use English prompts even for non-English content (better generation quality)

  • Keep prompts under 500 words; be specific but not verbose

Style Consistency for Multi-Image Sets

When generating a series (e.g. slide deck), prepend a style prefix to every prompt:

Style prefix: "flat vector illustration, soft pastel color palette, clean lines, minimal detail, 16:9 widescreen"

Slide 1 prompt: "{style_prefix}, a wide establishing shot of a modern office building at sunrise" Slide 2 prompt: "{style_prefix}, a close-up of hands typing on a laptop keyboard"

This ensures visual coherence across all generated images.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

skill-evolver

No summary provided by upstream source.

Repository SourceNeeds Review
General

audio-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
General

imagegen

No summary provided by upstream source.

Repository SourceNeeds Review
General

markdown-to-storyboard

No summary provided by upstream source.

Repository SourceNeeds Review