ERNIE-Image Visual Promptsmith
Use this community skill to craft ERNIE-Image prompts and generate images through the AI Studio ERNIE-Image-Turbo endpoint. It is not official Baidu or ERNIE-Image software.
Decide the Mode
- Generate immediately when the user asks to generate, draw, create, make an image, or uses equivalent Chinese generation wording.
- Return prompt-only guidance when the user asks to optimize, rewrite, improve, or review a prompt.
- Ask one concise question only if an exact visible text string, language, or required aspect ratio is missing and guessing would likely break the result.
API Endpoint
- Base:
https://aistudio.baidu.com/llm/lmapi/v3 - Submit:
POST /images/generations - Full URL:
https://aistudio.baidu.com/llm/lmapi/v3/images/generations - Auth header:
Authorization: bearer <BAIDU_AISTUDIO_API_KEY> - Platform header:
X-Client-Platform: aistudio
API Key
- Required environment variable:
BAIDU_AISTUDIO_API_KEY - Get a key:
https://aistudio.baidu.com/account/accessToken - If the key is missing, do not call the API. Tell the user to set
BAIDU_AISTUDIO_API_KEY.
Triggers
- Chinese examples:
ERNIE image: <prompt>,Wenxin image: <prompt>,generate image: <prompt>, or equivalent Chinese wording for image generation. - English examples:
ernie image: <prompt>,generate image: <prompt>,create image: <prompt>. - Treat text after the colon as the raw user prompt, improve it, choose a preset, then generate.
- If the user asks to optimize, rewrite, improve, or review a prompt, return prompt-only guidance and do not call the API.
Prompt Workflow
- Classify the image style: photorealistic, anime/manga, text-in-image, concept art, abstract/artistic, layout/composition, poster, ecommerce, infographic, comic/storyboard, UI screenshot style, or character-consistent visual.
- Preserve immutable constraints: exact in-image text, language, subject count, character identity, spatial relationships, size, style, and forbidden elements.
- Build the core prompt in five parts: subject -> action/context -> style -> lighting -> quality.
- For layout-sensitive requests, append composition -> exact text -> spatial placement.
- Keep in-image writing short when possible. Turn paragraphs into titles, labels, badges, or numbered lines.
- For text rendering, put exact wording in quotes and specify placement, font weight, alignment, color, background contrast, and whitespace.
- Choose a preset from
auto,text-poster,infographic,comic,product,ui,photo,concept, orabstract. - Before generation, state:
Final Prompt: <prompt>
Preset: <preset>
use_pe: <true or false>
Size: <size>
Reason: <why these settings fit ERNIE-Image>
Generation Workflow
Use the bundled Python script. Prefer python3; on Windows use python or py if needed.
python3 {baseDir}/scripts/generate.py --prompt "<FINAL_PROMPT>" --preset <preset>
For exact text, bilingual labels, UI, flowcharts, signs, comics, or already detailed prompts, pass --no-use-pe.
python3 {baseDir}/scripts/generate.py --prompt "<FINAL_PROMPT>" --preset text-poster --no-use-pe
The script prints IMAGE_URL:<url> for URL responses and MEDIA:<absolute_path> for each saved image. Return the saved media path to the user.
If BAIDU_AISTUDIO_API_KEY is missing, tell the user to get a key from https://aistudio.baidu.com/account/accessToken and set BAIDU_AISTUDIO_API_KEY.
Submit Payload
{
"model": "ERNIE-Image-Turbo",
"prompt": "<FINAL_PROMPT>",
"n": 1,
"response_format": "url",
"size": "1024x1024",
"seed": 42,
"use_pe": true,
"num_inference_steps": 8,
"guidance_scale": 1.0
}
Download and Output
response_format=urlreturns image URLs indata[]; the script printsIMAGE_URL:<url>.- The script downloads each URL immediately and saves the image locally.
- The script prints
MEDIA:<absolute_path>for OpenClaw/ClawHub auto-attach. - URLs may expire; the local file remains available after download.
- Output names are generated as
ernie-image-<timestamp>-<index>.<ext>. - Do not pass user-controlled filenames to shell commands.
Defaults
- Model:
ERNIE-Image-Turbo - Preset:
auto - Count:
1 - Response format:
url - Seed:
42 text-poster,infographic,comic,product, anduipresets default touse_pe=false.photo,concept, andabstractpresets default touse_pe=true.
Negative Prompt Rules
- Do not add
text,letters,typography,Chinese text, orEnglish textwhen the user wants readable writing. - Prefer precise negatives: distorted text, misspelled words, duplicated letters, unreadable typography, warped layout, cropped title, low contrast, blurry details, inconsistent panels, artifacts.
- The API does not expose a separate negative prompt field in this skill. Express exclusions as natural language constraints inside the prompt, such as "avoid cluttered background" or "no visible watermark".
Retry Strategy
- Text errors: reduce the amount of visible text, quote exact words once, add stronger placement and contrast, then use
--no-use-pe. - Layout errors: simplify object count, name each region, use grid/split-screen/foreground/background terms, then keep the same seed.
- Weak style: add camera/lens, art movement, medium, color temperature, material texture, and lighting direction.
- Cluttered image: remove secondary elements, add negative space, use "avoid cluttered background", and switch to a simpler preset if needed.
References
- Read
references/api.mdfor parameters, command examples, and endpoint mapping. - Read
references/prompt-architecture.mdfor ERNIE-Image prompt templates. - Read
references/examples.mdfor acceptance-style examples.