Multi-Model Research Agent

Implements Karpathy's LLM Council pattern for superior research through parallel queries, peer review, and chairman synthesis.

Architecture

Geoffrey/Claude (Native Council Member):

Routes simple vs complex queries
Calls external API orchestrator (research.py)
Provides my own research response
Conducts peer review phase
Requests GPT-5.1 synthesis (chairman)
Saves final report to Obsidian

Python External API Orchestrator:

Fetches responses from GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1
Returns JSON with all external responses
I handle all orchestration and synthesis

When to Use This Skill

Use multi-model research when:

Complex analysis needed - Multiple perspectives valuable
Factual verification critical - Cross-model validation
Comprehensive coverage required - No single model sufficient
Current information essential - Perplexity provides web grounding
Contested topics - Benefit from diverse model perspectives

Simple vs Council Mode

Simple Mode (Perplexity only):

Factual lookups
Current events
Quick research with citations
Completes in <15 seconds

Council Mode (Full council):

Comparative analysis
Deep research
Multiple perspectives needed
Strategic questions
Completes in <90 seconds

Workflow

Simple Query

User: "What are the latest developments in quantum computing?"
     ↓
I decide: Simple query (factual, current)
     ↓
I call: uv run scripts/research.py --query "..." --models perplexity
     ↓
I read: JSON response from Perplexity
     ↓
I format: Markdown report with citations
     ↓
I save: To Obsidian Geoffrey/Research folder
     ↓
I return: Summary to user with Obsidian link

Council Query

User: "Compare the AI strategies of OpenAI, Anthropic, and Google"
     ↓
I decide: Council query (comparative, complex)
     ↓
I call: uv run scripts/research.py --query "..." --models gpt,gemini,perplexity,grok
     ↓
I read: JSON with all external responses
     ↓
I provide: My own (Claude) research response
     ↓
I conduct: Peer review (each model ranks others)
     ↓
I request: GPT-5.1 chairman synthesis
     ↓
I format: Comprehensive markdown report
     ↓
I save: To Obsidian Geoffrey/Research folder
     ↓
I return: Summary with Obsidian link

Output Format

All research reports saved to Obsidian include:

Executive Summary (2-3 paragraphs)
Key Findings (organized by theme, inline citations)
Confidence Assessment (what's certain vs debated)
References Section (all sources with URLs and dates)

Citations use numeric format: [1], [2], etc.

Technical Details

Python Script:

cd skills/multi-model-research
uv run scripts/research.py --query "Your question" --models perplexity --output /tmp/responses.json

Config:

config.yaml - Model settings, routing rules
prompts/system_prompts.yaml - Per-model system prompts
prompts/peer_review.md - Peer review template
prompts/chairman_synthesis.md - GPT-5.1 synthesis template

Dependencies:

httpx (async HTTP client)
pyyaml (config parsing)
python-dotenv (env vars)
python-frontmatter (Obsidian frontmatter)

API Keys Required:

OPENAI_API_KEY (GPT-5.1)
GEMINI_API_KEY (Gemini 3.0 Pro)
PERPLEXITY_API_KEY (Sonar Pro)
XAI_API_KEY (Grok 4.1)

All keys configured in ~/.env file.

Examples

Simple Research:

User: "What is RAG in AI?"

I route to: Simple mode (Perplexity)
Output: Concise explanation with current examples and citations
Time: ~10 seconds

Council Research:

User: "Compare serverless vs containers for production ML workloads"

I route to: Council mode (all 4 external + me)
Process:
  1. GPT-5.1: Provides comprehensive technical comparison
  2. Gemini 3.0: Analyzes cost and performance trade-offs
  3. Perplexity: Current industry trends and case studies
  4. Grok 4.1: Developer sentiment from X/Twitter
  5. Claude (me): Synthesize with nuanced analysis
  6. Peer review: Each model ranks others
  7. GPT-5.1 (chairman): Final synthesis

Output: Multi-perspective analysis with citations
Time: ~60 seconds

Limitations

Cost: Council mode uses 4-5 API calls per query
Latency: Council mode takes 60-90 seconds
API Limits: Rate limits may throttle parallel requests
Citation Quality: Non-Perplexity models require URL extraction

Future Enhancements

Streaming responses during deliberation
Cost tracking and budget limits
Query history and versioning
Custom model weights based on topic
Integration with Geoffrey's knowledge base

This skill implements Karpathy's LLM Council pattern released November 22, 2025.