MAKER Framework Skill

Transform unreliable single-model inference into robust, verifiable reasoning through maximal decomposition, parallel consensus voting, and systematic error filtering.

When to Use MAKER

High-value triggers:

Tasks requiring >90% accuracy (medical, legal, financial)
Multi-step reasoning where errors compound (p^n problem)
Verification-critical outputs (code, calculations, facts)
Ambiguous tasks benefiting from diverse perspectives

Skip MAKER for:

Single-fact retrieval (no decomposition benefit)
Creative tasks where diversity is desirable
Time-critical responses (voting adds latency)

Core Architecture

MAKER operates on three pillars applied sequentially:

Task → [Pillar 1: Decompose] → DAG of subtasks
     → [Pillar 2: Vote]      → Parallel execution + consensus
     → [Pillar 3: Filter]    → Red-flag invalid outputs
     → Validated Result

Pillar 1: Maximal Agentic Decomposition (MAD)

Decompose complex tasks into atomic, independently-executable subtasks forming a DAG.

Decomposition principles:

Each subtask has single, well-defined objective
Subtasks receive explicit input/output schemas
Dependencies form acyclic graph (no cycles)
Maximize width (parallelism) over depth (sequential)

Tool: maker_build_dag

Pillar 2: First-to-Ahead-by-k Voting

Execute each subtask with m parallel agents; accept when one result leads by k votes.

Configuration by criticality:

Level	m	k	Confidence
low	3	1	~70%
medium	5	2	~85%
high	7	3	~95%
critical	11	5	~99%

Tool: maker_vote, maker_get_config

Pillar 3: Red-Flagging System

Discard outputs exhibiting error indicators before voting.

Red flag types:

Length exceeded (verbose = uncertain)
Format violation (schema mismatch)
Placeholder detected ([TODO], [N/A])
Uncertainty markers ("possibly", "might be")

Tool: maker_red_flag

Workflow

Standard MAKER Pipeline

1. Decompose task → maker_build_dag
2. For each subtask in topological order:
   a. Generate prompts → maker_generate_prompt (×m)
   b. Execute agents (parallel LLM calls)
   c. Validate outputs → maker_red_flag (each)
   d. Vote on valid outputs → maker_vote
3. Compose results → maker_compose_results

Example: Multi-Hop QA

Task: "What is the capital of the country where the inventor of the telephone was born?"

Step 1: Decompose

{
  "subtasks": [
    {"id": "t1", "description": "Identify inventor of telephone", "dependencies": []},
    {"id": "t2", "description": "Determine birthplace of {t1}", "dependencies": ["t1"]},
    {"id": "t3", "description": "Identify capital of {t2}", "dependencies": ["t2"]}
  ]
}

Step 2: Execute with voting (m=5, k=2 for medium criticality)

t1 outputs: ["Alexander Graham Bell", "Alexander Graham Bell", "A.G. Bell", "Alexander Graham Bell", "Bell"] → Normalize → "alexander graham bell" wins with 4 votes

t2 (with input "Alexander Graham Bell"): → "Edinburgh, Scotland" wins after red-flagging one verbose response

t3 (with input "Scotland"): → "Edinburgh" wins unanimously

Step 3: Compose Final answer: "Edinburgh"

Integration with Reasoning Skills

With hierarchical-reasoning

MAKER complements hierarchical-reasoning by adding reliability to each reasoning level:

Strategic level → MAKER(criticality=high) for key decisions
Tactical level  → MAKER(criticality=medium) for approach validation
Operational     → Direct execution for atomic operations

With knowledge-graph

Use MAKER voting on entity extraction to achieve higher-quality knowledge graphs:

Document → [MAKER: Extract entities (m=5)] → Validated entities
        → [MAKER: Extract relations (m=5)] → Validated relations
        → knowledge-graph merge

Tool Reference

maker_build_dag

Construct DAG from subtask definitions. Validates acyclicity and computes execution order.

maker_red_flag

Apply red-flag validation to agent output. Returns is_valid boolean and flag details.

maker_vote

Execute first-to-ahead-by-k voting. Returns consensus output with confidence score.

maker_compute_reliability

Calculate theoretical system reliability for given (m, k, n) configuration.

maker_get_config

Get recommended (m, k) configuration for criticality level.

maker_compose_results

Combine validated subtask outputs into final result.

maker_generate_prompt

Create optimized micro-agent prompt with constraints and schema.

Configuration Guide

Selecting m and k

Cost-accuracy tradeoff:

Higher m → more reliable but costlier
Higher k → stronger consensus but slower termination
Early termination typically reduces cost by 30-50%

Decision framework:

Start with criticality-based defaults via maker_get_config
Use maker_compute_reliability to validate configuration
Adjust based on empirical accuracy and cost metrics

Output Schema Design

Well-designed schemas enable format-based red-flagging:

{
  "type": "object",
  "properties": {
    "answer": {"type": "string"},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["answer"]
}

Equivalence Methods

exact: String equality after trim (dates, numbers)
normalized: Lowercase + whitespace normalization (text)
json: Parse and re-serialize for canonical comparison (structured)

Performance Characteristics

Reliability improvement (assuming 85% agent accuracy):

Steps	Single Agent	MAKER (m=5, k=2)
1	85.0%	97.1%
3	61.4%	91.5%
5	44.4%	86.2%

Cost multiplier: ~4-6× single agent (with early termination)

Latency: ~2-4× single agent (parallelism offsets voting overhead)

Error Handling

Insufficient valid outputs (red-flagging too aggressive):

Retry with additional agents
Relax red-flag thresholds
Refine subtask prompt

No consensus (high disagreement):

Further decompose the problematic subtask
Increase k threshold
Escalate to human review

Cycle detected in DAG:

Review dependency structure
Break circular dependencies into sequential steps

maker-framework

Safety Notice

Copy this and send it to your AI assistant to learn