GPT Image 2 — High-quality AI image generation

Powered by ClawdChat — calls OpenAI gpt-image-2 through the Uno tool gateway.

What this skill does

Two command-line invocations against the public ClawdChat tool gateway:

Tool slug	Purpose	Cost
`gpt-image-2.gpt_image2_submit`	Submit a generation job, returns `job_id` immediately (async)	300 credits / call
`gpt-image-2.gpt_image2_result`	Poll job status / fetch image URL when ready	0 credits

The bundled bin/uno.py handles authentication, HTTP transport, and rate-limiting. No external skill dependency required — install this skill and you're ready.

Credentials & permissions (read before first use)

Credential type: ClawdChat API key (Bearer token).
Where it lives: ~/.uno/credentials.json. Managed entirely by the bundled bin/uno.py; this skill never opens, prints or copies it.
How it was obtained: run python bin/uno.py login (see Setup below). The bundled script drives a ClawdChat OAuth device-code flow and stores the resulting token.
What it authorises: calling the ClawdChat tool gateway as the logged-in user. Each gpt_image2_submit deducts 300 credits from that account.
Network egress: the user's prompt text and any reference_image_urls are sent to the ClawdChat gateway over HTTPS. Do not paste private, confidential, or personally-identifying content into the prompt unless you are comfortable with the gateway's data handling — see https://clawdchat.cn for the data policy.
What the credential authorises: OAuth scope mcp:gpt-image-2 — scoped specifically to gpt-image-2 tool calls. Only gpt-image-2.gpt_image2_submit and gpt-image-2.gpt_image2_result are invoked; no other gateway tools are called.
Login output: the login --poll step returns the new API key once in its JSON response (standard device-code OAuth confirmation). Treat that terminal output as a secret — do not log or share it. bin/uno.py immediately persists the key to ~/.uno/credentials.json and does not print it again.
Logging out / revoking: run python bin/uno.py logout to delete ~/.uno/credentials.json and end the local session.

Cost transparency & confirmation rule

Every gpt_image2_submit call costs the logged-in account real credits. The agent must:

Show the user the planned prompt, size, style, and number of images before the first call.
Ask for explicit confirmation when the user has not already approved a generation in the current turn.
For multi-image batches (n > 1) or retries, treat each submission as a separate spending event and confirm again unless the user has pre-authorised the batch.
On error responses, surface the error to the user instead of silently retrying.

Polling via gpt_image2_result is free; only submit spends credits.

Setup

No external skill dependency. After clawhub install gpt-image2 the layout is:

gpt-image2/
├── SKILL.md
└── bin/
    └── uno.py    # bundled CLI — no extra install needed

All commands below run from inside the gpt-image2/ folder.

Check if already logged in

python bin/uno.py whoami --compact

Returns user info (name, email, credits) → credentials valid, skip login.
Returns {"error": "Not logged in"} → proceed to login below.

Log in

python bin/uno.py login --start

This prints a device code and a URL like https://clawdtools.uno/device?code=XXXX.

Open that URL in a browser. If not yet signed in to clawdtools.uno, the page redirects to ClawdChat SSO automatically and returns to the Authorise screen. Click "Authorise".

Then poll for completion:

python bin/uno.py login --poll DEVICE_CODE

Or run python bin/uno.py login (blocking, polls automatically).

Credential file (~/.uno/credentials.json) is written by bin/uno.py and reused on subsequent runs.

Generating an image — full async flow

A single 1024×1024 image typically takes ~150 s, longer than the default MCP 60 s timeout. Always use the submit → poll-result pattern.

Step 1 — submit

python bin/uno.py call gpt-image-2.gpt_image2_submit --compact \
  --args '{"prompt":"A shiba inu under cherry blossoms, sunny afternoon","size":"1024x1024","style":"ghibli_anime"}'

Response (already flattened — no need to unwrap content[0].text):

{"success": true, "data": {"status": "pending", "job_id": "0b84b8f0f0c8", "estimated_seconds": 150}, "meta": {"latency_ms": 120, "credits_used": 300}}

Record data.job_id.

Step 2 — poll for result

python bin/uno.py call gpt-image-2.gpt_image2_result --compact --timeout 70 \
  --args '{"job_id":"0b84b8f0f0c8","wait_seconds":50}'

wait_seconds=50 makes the server-side wait 50 s (within the 60 s MCP envelope); --timeout 70 adds a small client buffer.

Repeat the call until data.status is one of:

done — image ready, URLs in data.items[].url.
error — generation failed, message in data.error.
pending / running — call again immediately. Do not add a client-side sleep; the server already waited 50 s on your behalf.

Three to five iterations (~150–250 s total) is normal.

Reference Python loop

Use subprocess.run with an argument list to safely pass arbitrary prompt text without shell-injection risk:

import json, subprocess, sys

UNO = "bin/uno.py"

def uno(args):
    r = subprocess.run(["python", UNO] + args, capture_output=True, text=True)
    return json.loads(r.stdout)

prompt = "Van Gogh starry night"
resp = uno(["call", "gpt-image-2.gpt_image2_submit", "--compact",
            "--args", json.dumps({"prompt": prompt, "style": "oil_painting_vangogh"})])
job_id = resp["data"]["job_id"]

for _ in range(6):
    r = uno(["call", "gpt-image-2.gpt_image2_result", "--compact", "--timeout", "70",
             "--args", json.dumps({"job_id": job_id, "wait_seconds": 50})])
    status = r["data"]["status"]
    if status == "done":
        print(json.dumps(r, ensure_ascii=False, indent=2))
        break
    if status == "error":
        print("Error:", r["data"].get("error"), file=sys.stderr)
        sys.exit(1)

Parameters

Field	Meaning	Values
`prompt`	Image description (required, any language)	free text
`size`	Image dimensions	`1024x1024` (default), `1024x1536` (portrait), `1536x1024` (landscape), `auto`
`n`	Number of images to generate	1–4 (default 1)
`style`	Built-in style preset	one of the 20 keys below
`reference_image_urls`	Reference images (image-to-image)	URL string, comma-separated for multiple

20 built-in style presets

key	description
`ghibli_anime`	Studio Ghibli / hand-drawn anime
`pixar_3d`	Pixar / Disney 3D animation
`claymation`	Stop-motion claymation (Laika / Aardman)
`lego_brick`	LEGO bricks
`popmart_figurine`	Blind-box / Pop Mart figurine
`isometric_game`	Isometric 2.5D game scene
`cinematic_photo`	Cinematic photorealism (35mm)
`polaroid_film`	Polaroid film snapshot
`watercolor_ink`	Watercolour / East-Asian ink wash
`oil_painting_vangogh`	Van Gogh impasto oil painting
`cyberpunk_neon`	Cyberpunk neon nightscape
`vintage_infographic`	Retro infographic / data poster
`movie_poster`	Movie poster (large title + still)
`flat_vector`	Flat-vector illustration / banner
`pixel_8bit`	Pixel art (8/16-bit)
`papercraft_layered`	Layered papercraft
`exploded_diagram`	Exploded technical diagram
`dreamcore_liminal`	Dreamcore / liminal space
`knolling_flatlay`	Top-down knolling / flat-lay
`botanical_engraving`	Botanical engraving / antique illustration

Where this model shines (vs Midjourney / Flux / SD)

Accurate text rendering — poster headlines, infographics, menu typography, meme captions: written into the image as specified.
Strong prompt following — multi-element scenes, ordering and spatial relationships obeyed.
Subject preservation in image-to-image — faces, brands, and characters stay consistent across reference images.
Wide style coverage — Ghibli, Pixar, claymation, LEGO, Pop Mart, botanical engraving etc. all handled.

Agent guidance

Tell the user up-front that one image takes ~150 s.
The gpt_image2_result tool already sleeps 50 s server-side — never add an extra client-side sleep between polls.
Use --timeout 70 for result calls (50 s server wait + buffer).
Pass the user's prompt verbatim, including non-English text.
Reference images: combine reference_image_urls with a style preset for "restyle while keeping the subject".
Posters / infographics / menus: lean on the text-rendering strength.
If submit returns success=false, surface the error/hint fields to the user.
If the loop exhausts (~600 s) and status is still running, tell the user the job can be re-polled later with the same job_id.

Response shape

{
  "success": true,
  "data": {"status": "...", "job_id": "...", "items": [{"url": "..."}]},
  "meta": {"latency_ms": 120, "credits_used": 300}
}

Read data.status, data.job_id, data.items[].url directly.

Errors:

{"success": false, "error": "...", "hint": "..."}

gpt-image2

Safety Notice

Copy this and send it to your AI assistant to learn