ComfyUI Workflow Builder
Translates natural language requests into executable ComfyUI workflow JSON. Always validates against inventory before generating.
Workflow Generation Process
Step 1: Understand the Request
Parse the user's intent into:
- Output type: Image, video, or audio
- Source material: Text-only, reference image(s), existing video
- Identity method: None, zero-shot (InstantID/PuLID), LoRA, Kontext
- Quality level: Draft (fast iteration) vs production (maximum quality)
- Special requirements: ControlNet, inpainting, upscaling, lip-sync
Step 2: Check Inventory
Read state/inventory.json to determine:
- Available checkpoints → select best match for task
- Available identity models → determine which methods are possible
- Available ControlNet models → enable pose/depth control if available
- Custom nodes installed → verify all required nodes exist
- VRAM available → optimize settings accordingly
Step 3: Select Pipeline Pattern
Based on request + inventory, choose from:
| Pattern | When | Key Nodes |
|---|---|---|
| Text-to-Image | Simple generation | Checkpoint → CLIP → KSampler → VAE |
| Identity-Preserved Image | Character consistency | + InstantID/PuLID/IP-Adapter |
| LoRA Character | Trained character | + LoRA Loader |
| Image-to-Video (Wan) | High-quality video | Diffusion Model → Wan I2V → Video Combine |
| Image-to-Video (AnimateDiff) | Fast video, motion control | + AnimateDiff Loader + Motion LoRAs |
| Talking Head | Character speaks | Image → Video → Voice → Lip-Sync |
| Upscale | Enhance resolution | Image → UltimateSDUpscale → Save |
| Inpainting | Edit regions | Image + Mask → Inpaint Model → KSampler |
Step 4: Generate Workflow JSON
ComfyUI workflow format:
{
"{node_id}": {
"class_type": "{NodeClassName}",
"inputs": {
"{param_name}": "{value}",
"{connected_param}": ["{source_node_id}", {output_index}]
}
}
}
Rules:
- Node IDs are strings (typically "1", "2", "3"...)
- Connected inputs use array format:
["source_node_id", output_index] - Output index is 0-based integer
- Filenames must match exactly what's in inventory
- Seed values: use random large integer or fixed for reproducibility
Step 5: Validate
Before presenting to user:
- Every
class_typeexists in inventory's node list - Every model filename exists in inventory's model list
- All required connections are present (no dangling inputs)
- VRAM estimate doesn't exceed available VRAM
- Resolution is compatible with chosen model (512 for SD1.5, 1024 for SDXL/FLUX)
Step 6: Output
If online mode: Queue via comfyui-api skill
If offline mode: Save JSON to projects/{project}/workflows/ with descriptive name
Workflow Templates
Basic Text-to-Image (FLUX)
{
"1": {
"class_type": "LoadCheckpoint",
"inputs": {"ckpt_name": "flux1-dev.safetensors"}
},
"2": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "{positive_prompt}", "clip": ["1", 1]}
},
"3": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "{negative_prompt}", "clip": ["1", 1]}
},
"4": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 1024, "height": 1024, "batch_size": 1}
},
"5": {
"class_type": "KSampler",
"inputs": {
"seed": 42,
"steps": 25,
"cfg": 3.5,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["1", 0],
"positive": ["2", 0],
"negative": ["3", 0],
"latent_image": ["4", 0]
}
},
"6": {
"class_type": "VAEDecode",
"inputs": {"samples": ["5", 0], "vae": ["1", 2]}
},
"7": {
"class_type": "SaveImage",
"inputs": {"filename_prefix": "output", "images": ["6", 0]}
}
}
With Identity Preservation (InstantID + IP-Adapter)
Extends basic template by adding:
- Load reference image node
- InstantID Model Loader + Apply InstantID
- IPAdapter Unified Loader + Apply IPAdapter
- FaceDetailer post-processing
See references/workflows.md for complete node settings.
Video Generation (Wan I2V)
Uses different loader chain:
- Load Diffusion Model (not LoadCheckpoint)
- Wan I2V Conditioning
- EmptySD3LatentImage (with frame count)
- Video Combine (VHS)
See references/workflows.md Workflow 4 for complete settings.
VRAM Estimation
| Component | Approximate VRAM |
|---|---|
| FLUX FP16 | 16GB |
| FLUX FP8 | 8GB |
| SDXL | 6GB |
| SD1.5 | 4GB |
| InstantID | +4GB |
| IP-Adapter | +2GB |
| ControlNet (each) | +1.5GB |
| Wan 14B | 20GB |
| Wan 1.3B | 5GB |
| AnimateDiff | +3GB |
| FaceDetailer | +2GB |
Common Mistakes to Avoid
- Wrong output index: CheckpointLoader outputs
[model, clip, vae]at indices[0, 1, 2] - CFG too high for InstantID: Use 4-5, not default 7-8
- Wrong resolution for model: FLUX/SDXL=1024, SD1.5=512
- Missing VAE: FLUX needs explicit VAE (
ae.safetensors) - Wrong model in wrong loader: Diffusion models need
LoadDiffusionModel, notLoadCheckpoint
Reference Files
references/workflows.md- Detailed node-by-node templatesreferences/models.md- Model files and pathsreferences/prompt-templates.md- Model-specific promptsstate/inventory.json- Current inventory cache