comfyui-character-gen

Expert ComfyUI workflow builder for consistent AI character generation across image, video, and voice. Triggers on requests to create ComfyUI workflows, generate character-consistent images/videos, train LoRAs, use identity preservation methods (InfiniteYou, FLUX Kontext, PuLID Flux II, InstantID, IP-Adapter), create talking head videos, clone voices, or any multi-modal character generation pipeline. Covers FLUX Kontext, InfiniteYou, Wan 2.2 MoE, FramePack (6GB VRAM breakthrough), AnimateDiff V3, F5-TTS, Chatterbox, TTS Audio Suite, and all major 2024-2025 models.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "comfyui-character-gen" with this command: npx skills add mckruz/comfyui-expert/mckruz-comfyui-expert-comfyui-character-gen

ComfyUI Character Generation Expert

Build production-ready ComfyUI workflows for consistent character generation across image, video, and voice modalities.

Quick Decision: Which Approach?

Starting from reference images (like 3D renders)?InfiniteYou (state-of-the-art 2025) or InstantID + IP-Adapter (proven, lower VRAM)

Need highest identity fidelity?FLUX.2 (NEW 2026: up to 10 ref images) or PuLID Flux II (no model pollution)

Want iterative editing without retraining?FLUX Kontext (context-aware, maintains consistency across edits)

Creating video content?LTX-2 (NEW 2026: 4K production-ready), Wan 2.2 MoE (film-level), or FramePack (60-sec on 6GB!)

Need voice for character?TTS Audio Suite (unified platform, 23 languages) or F5-TTS Cross-Lingual (NEW 2026)

Core Workflow Patterns

Pattern 1: Zero-Shot Character Generation (No Training)

Best for: Quick iteration, 3D-to-photorealism conversion, limited reference images

Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale

Critical settings:

  • CFG: 4-5 (prevents burning with InstantID)
  • Resolution: 1016×1016 (avoids watermark artifacts)
  • IP-Adapter weight: 0.6-0.8
  • InstantID noise injection: 35% to negative

See references/workflows.md for complete node configurations.

Pattern 2: LoRA + Identity Methods (Maximum Consistency)

Best for: Production work, character series, video generation base

Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (optional) → Upscale

Training requirements:

  • 15-30 images, varied poses/expressions/lighting
  • Unique trigger word (e.g., "sage_character")
  • See references/lora-training.md for full parameters

Pattern 3: Video Generation Pipeline

Best for: Talking heads, character animation, promotional content

Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine

Model selection:

  • Wan 2.1 14B: Best quality, 24GB+ VRAM, slower
  • Wan 2.1 1.3B: 8GB VRAM, good quality, faster
  • AnimateDiff Lightning: Fastest, best for iteration

Pattern 4: Talking Head with Voice

Best for: Character dialogue, presentations, social content

Two approaches available:

Approach 1 (Image → Talking Head):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

Approach 2 (Video → Add Voice):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video

See references/talking-head-workflows.md for complete workflows and references/voice-synthesis.md for voice creation options.

Model Recommendations (2026 Updated)

Image Generation

Use CaseModelNotes
Best photorealismFLUX.1-devSlow but superior quality
Multi-reference consistencyFLUX.2NEW 2026: Up to 10 ref images, strong identity preservation
Fast iterationRealVisXL V5.0Good balance speed/quality
Character editingFLUX KontextContext-aware, maintains consistency across edits
Iterative refinementFLUX Kontext Pro/Max8x faster than GPT-Image (API)

Identity Preservation (2026 State-of-Art)

MethodBest ForVRAMNotes
FLUX.2Multi-reference consistency24GB+NEW 2026: Up to 10 ref images, branded content
InfiniteYouHighest identity match24GBICCV 2025 Highlight, SIM/AES variants
FLUX KontextIterative editing12-32GBBuilt-in consistency, no retraining
PuLID Flux IIDual characters, no pollution24-40GBContrastive alignment solves model pollution
AuraFaceCommercial identity encoding12GBNEW 2026: Open-source ArcFace alternative
InstantIDStyle transfer, 3D→realistic12GBMaintenance mode but still excellent
IP-Adapter FaceIDSpeed, lower VRAM6GB+Good baseline approach

Video Generation

ModelQualitySpeedVRAMNotes
LTX-2★★★★★Medium16GB+NEW 2026: First open-source 4K audio+video, production-ready
Wan 2.2 MoE★★★★★Slow24GB+Film-level aesthetics, first+last frame control
FramePack★★★★★Medium6GB60-sec videos, VRAM-invariant breakthrough
Wan 2.1 1.3B★★★★Medium8GB+Consumer-friendly
AnimateDiff V3★★★Fast8GBMotion/camera LoRAs, infinite length

Voice/TTS

ToolLicenseQualityFeatures
TTS Audio SuiteMulti★★★★★Unified platform, 23 languages, emotion control
F5-TTSMIT★★★★Zero-shot from <15 sec samples, Cross-Lingual 2026
ChatterboxMIT★★★★★Paralinguistic tags ([laugh], [sigh]), 4 voices
IndexTTS-2MIT★★★★8-emotion vector control
ElevenLabsCommercial★★★★★Production quality (API)

Essential Custom Nodes

Install via ComfyUI-Manager:

ComfyUI-Manager              # Must install first
ComfyUI_IPAdapter_plus       # IP-Adapter and FaceID
ComfyUI_InstantID            # InstantID workflow
ComfyUI-Impact-Pack          # FaceDetailer
ComfyUI-ReActor              # Face swapping
ComfyUI-AnimateDiff-Evolved  # Video generation
ComfyUI-VideoHelperSuite     # Video I/O
comfyui_controlnet_aux       # Pose/depth preprocessors
ComfyUI_UltimateSDUpscale    # Tiled upscaling
ComfyUI-Frame-Interpolation  # Smooth video

RTX 50 Series Optimization (NEW 2026)

With 32GB VRAM on RTX 5090, run most workflows without optimization. ComfyUI v0.8.1 adds major RTX 50 Series enhancements:

Launch flags: --highvram --fp8_e4m3fn-unet

NEW v0.8.1 Features:

  • NVFP4/NVFP8 precision formats: 3x faster performance, 60% VRAM reduction on RTX 50 Series
  • Weight streaming: Uses system RAM when VRAM exhausted, enables larger models on mid-range GPUs
  • Enable tiled VAE for 8K+ upscaling
  • Batch 4× 1024×1024 generations in parallel
  • Run Wan 2.2 14B + LTX-2 natively
  • Use FP8 quantization for FLUX (50% VRAM reduction)

Workflow Generation Process

When building a workflow for a user:

  1. Clarify the goal: Image only? Video? With voice? What's the source material?

  2. Select the pipeline pattern from above based on requirements

  3. Generate the workflow following node configurations in references/workflows.md

  4. Include model downloads with exact filenames and paths from references/models.md

  5. Provide parameter recommendations specific to their hardware/use case

Reference Files

  • references/research-2025.md - Latest 2024-2025 techniques: InfiniteYou, FLUX Kontext, PuLID Flux II, Wan 2.2 MoE, FramePack (6GB VRAM breakthrough), TTS Audio Suite, memory optimization discoveries
  • references/models.md - Complete model list with HuggingFace/Civitai links, file paths, and compatibility notes
  • references/workflows.md - Detailed node-by-node workflow templates for each pattern
  • references/lora-training.md - LoRA training guide with Kohya/AI-Toolkit parameters
  • references/voice-synthesis.md - Voice cloning, TTS, and lip-sync pipeline details
  • references/talking-head-workflows.md - Complete talking head workflows: Image→Talking Head (SadTalker, LivePortrait) and Video→Add Voice (Wav2Lip) with production scripts
  • references/evolution.md - Update sources, changelog, and user-specific learnings

Skill Evolution

This skill is designed to evolve. When helping the user:

Before starting a workflow:

  • Check if new models have dropped that might be better (search HuggingFace/Civitai if uncertain)
  • Consider if user's past successes/failures inform the approach

After completing a workflow:

  • Note what worked well or poorly for future reference
  • If user discovers better settings, update the relevant reference file

Proactive updates:

  • When the user mentions a new model or technique, research and integrate it
  • Periodically suggest checking for updates to key dependencies

See references/evolution.md for monitoring sources and update protocols.

Example: 3D Render to Photorealistic Character

For converting stylized 3D renders (like game/VN characters) to photorealistic images:

Recommended approach: InstantID + IP-Adapter FaceID on FLUX

1. Load 3D render reference (best quality, front-facing)
2. Apply InstantID (extracts identity + facial keypoints)
3. Apply IP-Adapter FaceID Plus V2 (weight 0.7)
4. Use FLUX.1-dev checkpoint
5. Prompt: "photorealistic portrait, detailed skin texture, natural lighting, [character description]"
6. CFG: 4-5, Steps: 25-30
7. FaceDetailer pass (denoise 0.35)
8. Upscale with 4x-UltraSharp

This converts the stylized look to photorealism while preserving the core identity features.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

comfyui-workflow-builder

No summary provided by upstream source.

Repository SourceNeeds Review
158-mckruz
General

comfyui-api

No summary provided by upstream source.

Repository SourceNeeds Review
175-mckruz
General

comfyui-video-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
General

comfyui-prompt-engineer

No summary provided by upstream source.

Repository SourceNeeds Review