GPT Image 2 — High-quality AI image generation
Powered by ClawdChat — calls OpenAI
gpt-image-2through the Uno tool gateway.
What this skill does
Two command-line invocations against the public ClawdChat tool gateway:
| Tool slug | Purpose | Cost |
|---|---|---|
gpt-image-2.gpt_image2_submit | Submit a generation job, returns job_id immediately (async) | 300 credits / call |
gpt-image-2.gpt_image2_result | Poll job status / fetch image URL when ready | 0 credits |
The bundled bin/uno.py handles authentication, HTTP transport, and rate-limiting. No external skill dependency required — install this skill and you're ready.
Credentials & permissions (read before first use)
- Credential type: ClawdChat API key (Bearer token).
- Where it lives:
~/.uno/credentials.json. Managed entirely by the bundledbin/uno.py; this skill never opens, prints or copies it. - How it was obtained: run
python bin/uno.py login(see Setup below). The bundled script drives a ClawdChat OAuth device-code flow and stores the resulting token. - What it authorises: calling the ClawdChat tool gateway as the logged-in user. Each
gpt_image2_submitdeducts 300 credits from that account. - Network egress: the user's
prompttext and anyreference_image_urlsare sent to the ClawdChat gateway over HTTPS. Do not paste private, confidential, or personally-identifying content into the prompt unless you are comfortable with the gateway's data handling — see https://clawdchat.cn for the data policy. - What the credential authorises: OAuth scope
mcp:gpt-image-2— scoped specifically to gpt-image-2 tool calls. Onlygpt-image-2.gpt_image2_submitandgpt-image-2.gpt_image2_resultare invoked; no other gateway tools are called. - Login output: the
login --pollstep returns the new API key once in its JSON response (standard device-code OAuth confirmation). Treat that terminal output as a secret — do not log or share it.bin/uno.pyimmediately persists the key to~/.uno/credentials.jsonand does not print it again. - Logging out / revoking: run
python bin/uno.py logoutto delete~/.uno/credentials.jsonand end the local session.
Cost transparency & confirmation rule
Every gpt_image2_submit call costs the logged-in account real credits. The agent must:
- Show the user the planned prompt, size, style, and number of images before the first call.
- Ask for explicit confirmation when the user has not already approved a generation in the current turn.
- For multi-image batches (
n > 1) or retries, treat each submission as a separate spending event and confirm again unless the user has pre-authorised the batch. - On
errorresponses, surface the error to the user instead of silently retrying.
Polling via gpt_image2_result is free; only submit spends credits.
Setup
No external skill dependency. After clawhub install gpt-image2 the layout is:
gpt-image2/
├── SKILL.md
└── bin/
└── uno.py # bundled CLI — no extra install needed
All commands below run from inside the gpt-image2/ folder.
Check if already logged in
python bin/uno.py whoami --compact
- Returns user info (name, email, credits) → credentials valid, skip login.
- Returns
{"error": "Not logged in"}→ proceed to login below.
Log in
python bin/uno.py login --start
This prints a device code and a URL like https://clawdtools.uno/device?code=XXXX.
Open that URL in a browser. If not yet signed in to clawdtools.uno, the page redirects to ClawdChat SSO automatically and returns to the Authorise screen. Click "Authorise".
Then poll for completion:
python bin/uno.py login --poll DEVICE_CODE
Or run python bin/uno.py login (blocking, polls automatically).
Credential file (~/.uno/credentials.json) is written by bin/uno.py and reused on subsequent runs.
Generating an image — full async flow
A single 1024×1024 image typically takes ~150 s, longer than the default MCP 60 s timeout. Always use the submit → poll-result pattern.
Step 1 — submit
python bin/uno.py call gpt-image-2.gpt_image2_submit --compact \
--args '{"prompt":"A shiba inu under cherry blossoms, sunny afternoon","size":"1024x1024","style":"ghibli_anime"}'
Response (already flattened — no need to unwrap content[0].text):
{"success": true, "data": {"status": "pending", "job_id": "0b84b8f0f0c8", "estimated_seconds": 150}, "meta": {"latency_ms": 120, "credits_used": 300}}
Record data.job_id.
Step 2 — poll for result
python bin/uno.py call gpt-image-2.gpt_image2_result --compact --timeout 70 \
--args '{"job_id":"0b84b8f0f0c8","wait_seconds":50}'
wait_seconds=50 makes the server-side wait 50 s (within the 60 s MCP envelope); --timeout 70 adds a small client buffer.
Repeat the call until data.status is one of:
done— image ready, URLs indata.items[].url.error— generation failed, message indata.error.pending/running— call again immediately. Do not add a client-side sleep; the server already waited 50 s on your behalf.
Three to five iterations (~150–250 s total) is normal.
Reference Python loop
Use subprocess.run with an argument list to safely pass arbitrary prompt text without shell-injection risk:
import json, subprocess, sys
UNO = "bin/uno.py"
def uno(args):
r = subprocess.run(["python", UNO] + args, capture_output=True, text=True)
return json.loads(r.stdout)
prompt = "Van Gogh starry night"
resp = uno(["call", "gpt-image-2.gpt_image2_submit", "--compact",
"--args", json.dumps({"prompt": prompt, "style": "oil_painting_vangogh"})])
job_id = resp["data"]["job_id"]
for _ in range(6):
r = uno(["call", "gpt-image-2.gpt_image2_result", "--compact", "--timeout", "70",
"--args", json.dumps({"job_id": job_id, "wait_seconds": 50})])
status = r["data"]["status"]
if status == "done":
print(json.dumps(r, ensure_ascii=False, indent=2))
break
if status == "error":
print("Error:", r["data"].get("error"), file=sys.stderr)
sys.exit(1)
Parameters
| Field | Meaning | Values |
|---|---|---|
prompt | Image description (required, any language) | free text |
size | Image dimensions | 1024x1024 (default), 1024x1536 (portrait), 1536x1024 (landscape), auto |
n | Number of images to generate | 1–4 (default 1) |
style | Built-in style preset | one of the 20 keys below |
reference_image_urls | Reference images (image-to-image) | URL string, comma-separated for multiple |
20 built-in style presets
| key | description |
|---|---|
ghibli_anime | Studio Ghibli / hand-drawn anime |
pixar_3d | Pixar / Disney 3D animation |
claymation | Stop-motion claymation (Laika / Aardman) |
lego_brick | LEGO bricks |
popmart_figurine | Blind-box / Pop Mart figurine |
isometric_game | Isometric 2.5D game scene |
cinematic_photo | Cinematic photorealism (35mm) |
polaroid_film | Polaroid film snapshot |
watercolor_ink | Watercolour / East-Asian ink wash |
oil_painting_vangogh | Van Gogh impasto oil painting |
cyberpunk_neon | Cyberpunk neon nightscape |
vintage_infographic | Retro infographic / data poster |
movie_poster | Movie poster (large title + still) |
flat_vector | Flat-vector illustration / banner |
pixel_8bit | Pixel art (8/16-bit) |
papercraft_layered | Layered papercraft |
exploded_diagram | Exploded technical diagram |
dreamcore_liminal | Dreamcore / liminal space |
knolling_flatlay | Top-down knolling / flat-lay |
botanical_engraving | Botanical engraving / antique illustration |
Where this model shines (vs Midjourney / Flux / SD)
- Accurate text rendering — poster headlines, infographics, menu typography, meme captions: written into the image as specified.
- Strong prompt following — multi-element scenes, ordering and spatial relationships obeyed.
- Subject preservation in image-to-image — faces, brands, and characters stay consistent across reference images.
- Wide style coverage — Ghibli, Pixar, claymation, LEGO, Pop Mart, botanical engraving etc. all handled.
Agent guidance
- Tell the user up-front that one image takes ~150 s.
- The
gpt_image2_resulttool already sleeps 50 s server-side — never add an extra client-side sleep between polls. - Use
--timeout 70forresultcalls (50 s server wait + buffer). - Pass the user's prompt verbatim, including non-English text.
- Reference images: combine
reference_image_urlswith astylepreset for "restyle while keeping the subject". - Posters / infographics / menus: lean on the text-rendering strength.
- If
submitreturnssuccess=false, surface theerror/hintfields to the user. - If the loop exhausts (~600 s) and status is still
running, tell the user the job can be re-polled later with the samejob_id.
Response shape
{
"success": true,
"data": {"status": "...", "job_id": "...", "items": [{"url": "..."}]},
"meta": {"latency_ms": 120, "credits_used": 300}
}
Read data.status, data.job_id, data.items[].url directly.
Errors:
{"success": false, "error": "...", "hint": "..."}