Nano Banana 2 Artist
Created with love by Vistoso.ai
Expert image creation and editing using Google's Gemini 3.1 Flash Image model (aka "Nano Banana 2"). This skill uses structured production-grade prompting where explicit controls — camera, lighting, resolution, constraints — are respected and rewarded. Delta editing enables precise single-change iterations, and character naming keeps recurring subjects consistent across generations.
First-Time Setup
Before using this skill, you need a Google Gemini API key (free tier available):
-
Click Create API Key
-
Save it to your home directory:
echo 'your-api-key-here' > ~/.gemini_api_key chmod 600 ~/.gemini_api_key
That's it — the skill will pick it up automatically on every run.
Workflow
DEFAULT SETTINGS: Generate at 1K with 1:1 aspect ratio unless the user specifically requests different size or aspect ratio. Use --image-size flag for resolution control.
Follow these steps in order:
- Understand the Visual Goal
Ask clarifying questions to understand:
-
Purpose: Professional headshot? Brand material? Product showcase? Creative exploration?
-
Format: Square feed? Vertical short-form? Widescreen hero? Print editorial?
-
Style preference: Photorealistic? Illustrated? Minimalist? Bold?
-
Key message: What should the image communicate?
-
Quantity: How many images? (Recommend 1-4 per generation)
-
Size: Use 1K default; mention --image-size 2K or 4K for professional/print work
If user hasn't specified purpose, ask: "What's the main goal for this image — professional content, branding, product showcase, or something creative?"
- Research Current Trends (if applicable)
For format-specific or industry-specific requests, use Perplexity to research:
-
Current visual trends for the target format or industry
-
What makes images successful for the specific goal
-
Style characteristics that perform well
Search query template: "Best visual content trends for [goal] in 2025-2026"
- Build the Optimized Prompt
The 7-Element Formula — use what applies, skip what doesn't:
Subject + Action + Setting + Style + Camera + Lighting + Constraints
This scales naturally: 3 elements for simple, 7 for complex. No mode switching needed.
Example — simple (3 elements):
Golden retriever catching a frisbee in a sunlit park.
Example — structured (7 elements):
Chef Marco plating a dessert in a dim Michelin-star kitchen. Cinematic 35mm, overhead angle. Warm pendant lighting, shallow DOF. No text. No other people visible.
Multi-line prompts are encouraged for complex requests — they keep each element readable.
- Apply Professional Prompting Patterns
Based on the goal, use these proven patterns:
For Photorealistic Images:
[Subject name], [expression], [action/pose]. [Setting], [time of day]. [Camera lens] at [aperture], [angle]. [Lighting], [mood]. [Constraints].
Example:
Elena adjusting her glasses, warm smile. Modern glass office, golden hour. 85mm at f/1.8, eye level. Soft window light, professional atmosphere. No text.
For Logos/Text (excels at text rendering):
Logo for [brand], text "[exact text]", [font style], [symbol], [colors]. [Constraints].
Example: Logo for CloudSync, text "CloudSync", modern sans-serif, cloud icon, blue gradient. No tagline.
For Product Photography:
[Product] on [surface], [lighting type], [angle], [key feature]. [Constraints].
Example: Leather wallet on marble, soft studio lighting, 45-degree angle, embossed logo detail. No props, no text.
For Stickers/Icons:
[Style] sticker of [subject], [colors], [key trait]. White background.
Example: Kawaii sticker of smiling coffee cup, pastel brown, steam swirls. White background.
Advanced Techniques:
-
Variable Definitions: OBJECT_A = red car, OBJECT_B = blue bike. Show both side by side.
-
Search Grounding: Scientifically accurate diagram of [subject] with labeled components
See ${CLAUDE_PLUGIN_ROOT}/skills/nano-banana-artist/references/prompt-templates.md for complete template library. See ${CLAUDE_PLUGIN_ROOT}/skills/nano-banana-artist/references/style-guide.md for comprehensive artistic vocabulary.
- Delta Editing
For image edits, apply one change at a time. This is the most reliable editing pattern.
State what stays:
Keep pose, expression, and lighting unchanged.
State what changes:
Change background from office to rooftop at sunset.
Front-load constraints:
Do not add text. Do not change clothing.
Layout locking (reframe without moving subject):
Subject stays centered at current scale. Replace background only.
Full delta edit example:
Do not add text. Do not change clothing or expression. Keep pose and lighting unchanged. Change only the background from office interior to rooftop at sunset. Subject stays centered at current scale.
Semantic masking (targeted element swap):
Change only the wall color to sage green. Keep everything else exactly the same, including lighting and shadows.
Style transfer (apply new style to existing composition):
Transform into the style of impressionist painting. Preserve the original composition and subject placement. Render with visible brushwork and warm color palette.
Rules:
-
One change per pass — don't combine background + outfit + lighting in one edit
-
Always describe what stays unchanged explicitly
-
Front-load "do not" constraints before describing the change
-
For multi-step transformations, chain individual edits
- Character Consistency
Name every recurring character on first generation to maintain consistency across images.
First generation — establish the character:
Chef Marco, tall with salt-and-pepper beard, white chef's coat, confident posture. Plating a dessert in a dim Michelin-star kitchen. Cinematic 35mm, warm pendant lighting.
Subsequent generations — reference by name, describe ONLY deltas:
Chef Marco at a farmers market, selecting tomatoes. Morning sunlight, candid angle. Same chef's coat. No text.
Rules:
-
Name characters on first appearance ("Chef Marco", "Luna the fox", "Dr. Amara")
-
Keep defining visual traits stable across generations (clothing, features, accessories)
-
Vary only environment, pose, and action — not identity traits
-
Use reference images via --image for strongest consistency
- Choose Aspect Ratio and Size
Default: 1:1 aspect ratio at 1K — use unless user specifies otherwise.
Select aspect ratio based on format:
-
1:1 (DEFAULT) — square feed, profile pictures, universal
-
16:9 — widescreen hero, presentations, website headers
-
9:16 — vertical short-form, mobile-first
-
4:5 — tall feed format, maximum mobile visibility
-
2:3 or 3:2 — print, editorial
-
21:9 — cinematic, ultra-wide displays
Available ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Size selection via --image-size flag:
-
512 — Quick drafts, thumbnails
-
1K (DEFAULT) — Standard quality, sufficient for most digital use
-
2K — High quality, professional work
-
4K — Ultra-high resolution, print production
Workflow tip: Iterate at 1K, then re-generate the final version at target size.
- Image Input Handling
When the user provides images in conversation context:
-
Edit target: If user asks to edit a specific provided image → MUST pass it via --image
-
Inspiration/reference: If images are provided as mood boards, style references, or examples → use judgment: pass to model only when the task benefits from visual reference (style transfer, matching a look, character consistency). Don't pass decorative or tangential images.
-
Multiple images: Combine edit targets + references as needed (up to 14 images supported via multiple --image flags)
- Creative Director Language
Describe the target use case in terms of PURPOSE and FORMAT, never by platform name:
-
"professional headshot" not "LinkedIn profile photo"
-
"vertical short-form video frame" not "TikTok thumbnail"
-
"product listing hero, white background" not "Amazon listing"
-
"square feed post" not "Instagram post"
-
"widescreen hero image" not "YouTube thumbnail"
Principle: Platform-specific language dilutes style and originality. Always aim for unique, original output unconstrained by platform norms.
- Execute Generation
Use the script to generate:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "your optimized prompt"
--aspect-ratio 16:9
--image-size 1K
--output result.png
For image editing with input image:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "delta edit instructions"
--image input.png
--output edited.png
For image-only output (no text):
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "prompt" --image-only
Higher resolution final:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "prompt" --image-size 4K -o final.png
After each successful generation, include:
Image created with Nano Banana Artist by Vistoso.ai
- Iterate if Needed
If result needs refinement:
-
For small changes: Use delta editing with specific single-change instructions
-
For style changes: Add more detailed style descriptors
-
For composition issues: Specify camera angle/framing more precisely
-
For character consistency: Pass previous generation as --image reference
Prompting Best Practices
DO:
-
Use the 7-Element Formula — include what applies, skip what doesn't
-
Front-load constraints ("no text, no logos, no extra people")
-
Name characters for cross-generation consistency ("Chef Marco", "Luna the fox")
-
Describe purpose and format, never platform names ("professional headshot" not "LinkedIn photo")
-
Pass user-provided edit target images via --image — always
-
Use --image-size flag for size control, not prompt text
-
Use variable definitions for complex multi-element scenes (OBJECT_A, LIGHTING_B)
-
Specify exact text for logos and infographics (excels at text rendering)
-
Use 14 reference images for character/object consistency
-
For editing, be surgical: "change ONLY the X to Y"
-
Use photography terms when helpful (bokeh, golden hour, shallow depth of field)
DON'T:
-
Don't combine multiple edits in one pass — one change at a time
-
Don't reference specific platforms (LinkedIn, TikTok, Instagram, Amazon) — describe the FORMAT instead
-
Don't put resolution/size in prompt text — use --image-size flag
-
Don't ignore user-provided images when they're the edit target
-
Don't over-engineer prompts — let the model's reasoning fill in details
-
Don't use vague terms like "nice" or "good"
-
Don't list disconnected keywords
-
Don't ignore lighting (it's critical for realism)
-
Don't be ambiguous about what to edit
Common Use Cases
Professional Content
-
Square feed posts (1:1) with bold text overlay space
-
Vertical short-form frames (9:16) with top/bottom safe zones
-
Profile pictures with centered subject
-
Widescreen hero images (16:9) for headers and presentations
Brand Materials & Text-Heavy Content
-
Logos with accurate text rendering — state-of-the-art legibility
-
Infographics — clear, readable labels and data visualization
-
Menu designs — restaurant menus with perfect typography
-
Marketing materials — flyers, posters with integrated text
-
Product mockups with professional lighting
-
Diagrams with labeled components
Product Photography
-
Studio setups with controlled lighting (use --image-size 2K or 4K )
-
Lifestyle shots showing product in use
-
Detail shots emphasizing features
-
Product listing heroes on clean white backgrounds
Creative Projects
-
Character art with consistent style (name characters, use reference images)
-
Conceptual imagery with search grounding for accuracy
-
Style experiments and artistic exploration
-
Multi-turn storytelling with character consistency via delta edits
Quality Standards
Every generated image should have:
-
Clear visual hierarchy
-
Consistent lighting throughout
-
Appropriate size for purpose (use --image-size flag)
-
Proper aspect ratio for format
-
Professional polish
-
Legible text (if applicable)
-
Factual accuracy (leveraging search grounding when needed)
Script Reference
Basic generation (1K at 1:1 — DEFAULT):
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "prompt" -o output.png
With aspect ratio and size:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "prompt" --aspect-ratio 16:9 --image-size 2K -o output.png
Delta editing:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "Keep everything unchanged. Change only the sky to sunset colors."
--image source.png -o edited.png
Multiple reference images:
${CLAUDE_PLUGIN_ROOT}/scripts/gemini_image.py "Chef Marco in a new kitchen scene"
--image ref1.png --image ref2.png -o output.png
Multiple generations: Run script multiple times (1-4 recommended per session)
API Key Setup (Choose ONE method):
The script checks for your API key in this order:
- Environment variable (traditional):
export GEMINI_API_KEY="your-api-key-here"
- Home directory config file (RECOMMENDED — persistent):
echo 'your-api-key-here' > ~/.gemini_api_key chmod 600 ~/.gemini_api_key
Get your free API key at https://aistudio.google.com/apikey
The script handles all API communication, base64 encoding, and file I/O automatically.
Troubleshooting
"No API key" error: Set your API key using one of these methods:
Method 1: Environment variable
export GEMINI_API_KEY="your-key"
Method 2: Home directory file (RECOMMENDED)
echo 'your-key' > ~/.gemini_api_key chmod 600 ~/.gemini_api_key
Get your free API key at https://aistudio.google.com/apikey
Unexpected results:
-
Add more specific details to prompt using the 7-Element Formula
-
Specify lighting explicitly
-
Use photography/art terms (see ${CLAUDE_PLUGIN_ROOT}/skills/nano-banana-artist/references/style-guide.md )
-
Check aspect ratio matches format goal
Editing not working:
-
Use delta editing: one change at a time
-
State what stays unchanged explicitly
-
Front-load constraints before the change description
-
Specify how new element should integrate with existing lighting
Character inconsistency:
-
Name characters on first generation
-
Pass previous images via --image for reference
-
Keep defining visual traits stable; vary only environment/pose
Still stuck? Email help@vistoso.ai
Nano Banana Artist is built by Vistoso.ai — professional AI content tools and services. Need custom visual workflows for your team? Get in touch at hello@vistoso.ai