LLM Council Skill
LIBRARY-FIRST PROTOCOL (MANDATORY)
Before writing ANY code, you MUST check:
Step 1: Library Catalog
-
Location: .claude/library/catalog.json
-
If match >70%: REUSE or ADAPT
Step 2: Patterns Guide
-
Location: .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
-
If pattern exists: FOLLOW documented approach
Step 3: Existing Projects
-
Location: D:\Projects*
-
If found: EXTRACT and adapt
Decision Matrix
Match Action
Library >90% REUSE directly
Library 70-90% ADAPT minimally
Pattern exists FOLLOW pattern
In project EXTRACT
No match BUILD (add to library after)
Purpose
Run 3-stage multi-model consensus for critical decisions where:
-
Single-model hallucination risk is unacceptable
-
Multiple perspectives improve decision quality
-
High-stakes choices need validation
Architecture (Karpathy Pattern)
STAGE 1: COLLECT +---> Claude ---> Response A | Query --+---> Gemini ---> Response B | +---> Codex ----> Response C
STAGE 2: RANK Each model reviews others (anonymized) Produces rankings with rationale
STAGE 3: SYNTHESIZE Chairman aggregates rankings Produces final answer with consensus score
When to Use
Perfect For:
-
Architecture decisions
-
Technology selection
-
Critical bug triage
-
Security assessment
-
High-risk deployments
-
Contentious design choices
Don't Use When:
-
Simple, low-risk decisions
-
Time-critical responses
-
Single correct answer exists
-
Cost is a concern (3x API usage)
Usage
Basic Council
/llm-council "Should we use microservices or monolith for this system?"
With Threshold
/llm-council "Which auth approach is best?" --threshold 0.75
With Chairman Override
/llm-council "Architecture decision" --chairman gemini
Command Pattern
bash scripts/multi-model/llm-council.sh "<query>" "<threshold>" "<chairman>"
Configuration
Parameter Default Description
threshold 0.67 Minimum consensus score
chairman claude Model that synthesizes final answer
models [claude, gemini, codex] Participating models
Consensus Scoring
-
0.80: Strong consensus - proceed with confidence
-
0.67-0.80: Moderate consensus - consider minority views
-
<0.67: Weak consensus - escalate to human review
Memory Integration
Results stored to Memory-MCP:
-
Key: multi-model/council/decisions/{query_id}
-
Tags: WHO=llm-council, WHY=consensus-decision
Output Format
{ "query": "Original question", "final_answer": { "synthesis": "Combined answer...", "chairman": "claude" }, "consensus_score": 0.85, "responses": { "claude": "...", "gemini": "...", "codex": "..." }, "rankings": [ {"model": "A", "rank": 1, "rationale": "..."} ] }
Failure Modes
Deadlock (No Consensus)
-
All models disagree
-
Consensus < threshold
-
Action: Store for human review
Model Unavailable
-
One model times out
-
Action: Continue with 2 models (2/3 quorum)
Chairman Failure
-
Synthesis fails
-
Action: Fallback to highest-ranked response
Integration Examples
Architecture Decision
const decision = await runCouncil( "Microservices vs Monolith for our scale?", { threshold: 0.75 } );
if (decision.consensus_score >= 0.75) { proceed(decision.final_answer); } else { escalateToHuman(decision); }
Security Assessment
const assessment = await runCouncil( "Is this authentication approach secure?", { threshold: 0.80 } ); // Higher threshold for security decisions
Sources
-
LLM Council by Andrej Karpathy
-
VentureBeat Analysis