Task Quality KPI Framework
Overview
The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.
┌─────────────────────────────────────────────────────────────┐ │ HOOK (auto-executes) │ │ Trigger: PostToolUse on TASK-*.md │ │ Script: task-kpi-analyzer.py │ │ Output: TASK-XXX--kpi.json │ ├─────────────────────────────────────────────────────────────┤ │ SKILL / AGENT (reads output) │ │ Input: TASK-XXX--kpi.json │ │ Action: Make evaluation decisions │ └─────────────────────────────────────────────────────────────┘
Why This Architecture?
Problem Solution
Skills can't execute scripts Hook auto-runs on file save
Subjective review_status Quantitative 0-10 scores
"Looks good to me" Evidence-based evaluation
Binary pass/fail Graduated quality levels
KPI File Location
After any task file modification, find KPI data at:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
KPI Categories
┌─────────────────────────────────────────────────────────────┐ │ OVERALL SCORE (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Spec Compliance (30%) │ │ ├── Acceptance Criteria Met (0-10) │ │ ├── Requirements Coverage (0-10) │ │ └── No Scope Creep (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Code Quality (25%) │ │ ├── Static Analysis (0-10) │ │ ├── Complexity (0-10) │ │ └── Patterns Alignment (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Test Coverage (25%) │ │ ├── Unit Tests Present (0-10) │ │ ├── Test/Code Ratio (0-10) │ │ └── Coverage Percentage (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Contract Fulfillment (20%) │ │ ├── Provides Verified (0-10) │ │ └── Expects Satisfied (0-10) │ └─────────────────────────────────────────────────────────────┘
Category Weights
Category Weight Why
Spec Compliance 30% Most important - did we build what was asked?
Code Quality 25% Technical excellence
Test Coverage 25% Verification and confidence
Contract Fulfillment 20% Integration with other tasks
When to Use
-
Reading KPI data for task quality evaluation
-
Understanding quality metrics and scoring breakdown
-
Deciding whether to iterate or approve based on quantitative data
-
Integrating KPI checks into automated loops (agents_loop.py )
-
Generating evidence-based evaluation reports
Instructions
- Reading KPI Data (Primary Use)
DO NOT run scripts - read the auto-generated file:
Read the KPI file: docs/specs/001-feature/tasks/TASK-001--kpi.json
- Understanding the Data
The KPI file contains:
{ "task_id": "TASK-001", "evaluated_at": "2026-01-15T10:30:00Z", "overall_score": 8.2, "passed_threshold": true, "threshold": 7.5, "kpi_scores": [ { "category": "Spec Compliance", "weight": 30, "score": 8.5, "weighted_score": 2.55, "metrics": { "acceptance_criteria_met": 9.0, "requirements_coverage": 8.0, "no_scope_creep": 8.5 }, "evidence": [ "Acceptance criteria: 9/10 checked", "Requirements coverage: 8/10" ] } ], "recommendations": [ "Code Quality: Moderate improvements possible" ], "summary": "Score: 8.2/10 - PASSED" }
- Making Decisions
Use overall_score and passed_threshold :
IF passed_threshold == true: → Task meets quality standards → Approve and proceed
IF passed_threshold == false: → Task needs improvement → Check recommendations for specific targets → Create fix specification
Integration with Workflow
In Task Review (evaluator-agent)
Review Process
- Read KPI file: TASK-XXX--kpi.json
- Extract overall_score and kpi_scores
- Read task file to validate
- Generate evaluation report
- Decision based on passed_threshold
In agents_loop
Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"
if kpi_path.exists(): kpi_data = json.loads(kpi_path.read_text())
if kpi_data["passed_threshold"]:
# Quality threshold met
advance_state("update_done")
else:
# Need more work
fix_targets = kpi_data["recommendations"]
create_fix_task(fix_targets)
advance_state("fix")
else: # KPI not generated yet - task may not be implemented log_warning("No KPI data found")
Multi-Iteration Loop
Instead of max 3 retries, iterate until quality threshold met:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → Proceed
Each iteration updates the KPI file automatically on task save.
Threshold Guidelines
Score Quality Level Action
9.0-10.0 Exceptional Approve, document best practices
8.0-8.9 Good Approve with minor notes
7.0-7.9 Acceptable Approve (if threshold 7.5)
6.0-6.9 Below Standard Request specific improvements
< 6.0 Poor Significant rework required
Recommended Thresholds
Project Type Threshold Rationale
Production MVP 8.0 High quality required
Internal Tool 7.0 Good enough
Prototype 6.0 Functional over perfect
Critical System 8.5 No compromises
Metric Details
Spec Compliance Metrics
Acceptance Criteria Met
-
Calculates: (checked_criteria / total_criteria) * 10
-
Source: Task file checkbox count
-
Example: 9/10 checked = 9.0
Requirements Coverage
-
Calculates: Count of REQ-IDs this task covers
-
Source: traceability-matrix.md
-
Example: 4 requirements covered = 8.0
No Scope Creep
-
Calculates: (implemented_files / expected_files) * 10
-
Source: Task "Files to Create" vs actual files
-
Penalizes: Missing files or unexpected additions
Code Quality Metrics
Static Analysis
-
Java: Maven Checkstyle
-
TypeScript: ESLint
-
Python: ruff
-
Score: 10 if passes, 5 if issues found
Complexity
-
Calculates: Functions >50 lines
-
Score: 10 - (long_functions_ratio * 5)
-
Penalizes: Large, complex functions
Patterns Alignment
-
Checks: Knowledge Graph patterns
-
Source: knowledge-graph.json
-
Validates: Implementation follows project patterns
Test Coverage Metrics
Unit Tests Present
-
Calculates: min(10, test_files * 5)
-
2 test files = maximum score
-
Penalizes: Missing tests
Test/Code Ratio
-
Calculates: (test_count / code_count) * 10
-
1:1 ratio = 10/10
-
Ideal: At least 1 test file per code file
Coverage Percentage
-
Source: Coverage reports (JaCoCo, lcov, etc.)
-
Calculates: coverage_percent / 10
-
80% coverage = 8.0
Contract Fulfillment Metrics
Provides Verified
-
Checks: Files exist and export expected symbols
-
Source: Task provides frontmatter
-
Validates: Contract satisfied
Expects Satisfied
-
Checks: Dependencies provide required files/symbols
-
Source: Task expects frontmatter
-
Validates: Prerequisites met
When KPI File is Missing
If TASK-XXX--kpi.json doesn't exist:
-
Task was never modified - Hook runs on file save
-
Hook failed - Check Claude Code logs
-
Task is new - Save the file first to trigger hook
DO NOT try to calculate KPIs manually. The hook runs automatically when:
-
Task file is saved (Write tool)
-
Task file is edited (Edit tool)
Best Practices
- Always Check KPI File Exists
Before evaluating:
Check if KPI file exists: docs/specs/[ID]/tasks/TASK-XXX--kpi.json
If missing:
- Task may not be implemented yet
- Ask user to save the task file first
- Trust the Metrics
The KPIs are objective. Only override with documented evidence:
-
Critical security issue not in metrics
-
Logic error not caught by static analysis
-
Exceptional quality not measured
- Iterate on Low KPIs
Target specific categories:
❌ "Fix code quality issues" ✅ "Improve Code Quality KPI from 5.2 to 7.0: - Complexity: Refactor processData() (5→8) - Patterns: Add error handling (6→8)"
- Track KPI Trends
Monitor quality over time:
Sprint 1: Average KPI 6.8 Sprint 2: Average KPI 7.3 (+0.5) Sprint 3: Average KPI 7.9 (+0.6)
Troubleshooting
KPI File Not Generated
Check:
-
Hook enabled in hooks.json
-
Task file name matches pattern TASK-*.md
-
File was actually saved (not just viewed)
KPI Scores Seem Wrong
Validate:
-
Check evidence field for data sources
-
Verify files exist at expected paths
-
Some metrics need build tools (Maven, npm)
Low Scores Despite Good Code
Possible causes:
-
Missing test files
-
No coverage report generated
-
Acceptance criteria not checked
-
Lint rules too strict
Fix the root cause, not just the score.
Examples
Example 1: Reading KPI Data
Read the KPI file to evaluate task quality: docs/specs/001-feature/tasks/TASK-042--kpi.json
Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests
Decision: REQUEST FIXES - target Test Coverage improvement
Example 2: Iteration Decision
Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗
Fix targets:
- Refactor complex functions (Code Quality)
- Add test coverage (Test Coverage)
Iteration 2 KPI: Score 7.8 → PASSED ✓
Example 3: agents_loop Integration
In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"
if kpi_file.exists(): kpi = json.loads(kpi_file.read_text())
if kpi["passed_threshold"]:
print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
advance_state("update_done")
else:
print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
print("Recommendations:")
for rec in kpi["recommendations"]:
print(f" - {rec}")
advance_state("fix")
References
-
evaluator-agent.md
-
Agent that uses KPI data for evaluation
-
hooks.json
-
Hook configuration for auto-generation
-
task-kpi-analyzer.py
-
Hook script (do not execute directly)
-
agents_loop.py
-
Orchestrator that reads KPI for decisions