Image Generation (AI SDK)
Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).
Script Directory
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
-
Determine this SKILL.md file's directory path as SKILL_DIR
-
Script path = ${SKILL_DIR}/scripts/<script-name>.ts
-
Replace all ${SKILL_DIR} in this document with the actual path
Script Reference:
Script Purpose
scripts/main.ts
CLI entry point for image generation
Quick Start
Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9
High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai
From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
Commands
Basic Image Generation
Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png
Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png
Aspect Ratios
Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4
Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024
Reference Images (Google Multimodal)
Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png
Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png
Quality Presets
Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal
High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
Output Formats
Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json
Options
Option Description
--prompt <text> , -p
Prompt text
--promptfiles <files...>
Read prompt from files (concatenated)
--image <path>
Output image path (required)
--provider google|openai
Force provider (default: google)
--model <id> , -m
Model ID
--ar <ratio>
Aspect ratio (e.g., 16:9 , 1:1 , 4:3 )
--size <WxH>
Size (e.g., 1024x1024 )
--quality normal|2k
Quality preset (default: normal)
--ref <files...>
Reference images (Google multimodal only)
--n <count>
Number of images
--json
JSON output
--help , -h
Show help
Environment Variables
Variable Description Default
OPENAI_API_KEY
OpenAI API key
GOOGLE_API_KEY
Google API key
OPENAI_IMAGE_MODEL
OpenAI model gpt-image-1.5
GOOGLE_IMAGE_MODEL
Google model gemini-3-pro-image-preview
OPENAI_BASE_URL
Custom OpenAI endpoint
GOOGLE_BASE_URL
Custom Google endpoint
Load Priority: CLI args > process.env
<cwd>/.baoyu-skills/.env ~/.baoyu-skills/.env
Provider & Model Strategy
Auto-Selection
-
If --provider specified → use it
-
If only one API key available → use that provider
-
If both available → default to Google (multimodal LLMs more versatile)
API Selection by Model Type
Model Category API Function Example Models
Google Multimodal generateText
gemini-2.0-flash-exp-image-generation
Google Imagen experimental_generateImage
imagen-3.0-generate-002
OpenAI experimental_generateImage
gpt-image-1 , dall-e-3
Available Models
Google:
-
gemini-3-pro-image-preview
-
Default, multimodal generation
-
gemini-2.0-flash-exp-image-generation
-
Gemini 2.0 Flash
-
imagen-3.0-generate-002
-
Imagen 3
OpenAI:
-
gpt-image-1.5
-
Default, GPT Image 1.5
-
gpt-image-1
-
GPT Image 1
-
dall-e-3
-
DALL-E 3
Quality Presets
Preset OpenAI Google Use Case
normal
1024x1024 Default Covers, illustrations
2k
2048x2048 "2048px" in prompt Infographics, slides
Aspect Ratio Handling
-
Multimodal LLMs: Embedded in prompt (e.g., "... aspect ratio 16:9" )
-
Image-only models: Uses aspectRatio or size parameter
-
Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
Examples
Generate Cover Image
npx -y bun ${SKILL_DIR}/scripts/main.ts
--prompt "A minimalist tech illustration with blue gradients"
--image cover.png --ar 2.35:1 --quality 2k
Generate Social Media Post
npx -y bun ${SKILL_DIR}/scripts/main.ts
--prompt "Instagram post about coffee"
--image post.png --ar 1:1
Edit Image with Reference
npx -y bun ${SKILL_DIR}/scripts/main.ts
--prompt "Change the background to sunset"
--image edited.png --ref original.png --provider google
Batch Generation from Prompt File
Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts
--promptfiles style-guide.md scene-description.md
--image scene.png
Error Handling
-
Missing API key: Clear error with setup instructions
-
Generation failure: Auto-retry once, then error
-
Invalid aspect ratio: Warning, proceed with default
-
Reference images with image-only model: Warning, ignore refs
Extension Support
Custom configurations via EXTEND.md.
Check paths (priority order):
-
.baoyu-skills/baoyu-image-gen/EXTEND.md (project)
-
~/.baoyu-skills/baoyu-image-gen/EXTEND.md (user)
If found, load before workflow. Extension content overrides defaults.