Task Quality KPI Framework

Overview

The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.

Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.

┌─────────────────────────────────────────────────────────────┐ │ HOOK (auto-executes) │ │ Trigger: PostToolUse on TASK-*.md │ │ Script: task-kpi-analyzer.py │ │ Output: TASK-XXX--kpi.json │ ├─────────────────────────────────────────────────────────────┤ │ SKILL / AGENT (reads output) │ │ Input: TASK-XXX--kpi.json │ │ Action: Make evaluation decisions │ └─────────────────────────────────────────────────────────────┘

Why This Architecture?

Problem Solution

Skills can't execute scripts Hook auto-runs on file save

Subjective review_status Quantitative 0-10 scores

"Looks good to me" Evidence-based evaluation

Binary pass/fail Graduated quality levels

KPI File Location

After any task file modification, find KPI data at:

docs/specs/[ID]/tasks/TASK-XXX--kpi.json

KPI Categories

┌─────────────────────────────────────────────────────────────┐ │ OVERALL SCORE (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Spec Compliance (30%) │ │ ├── Acceptance Criteria Met (0-10) │ │ ├── Requirements Coverage (0-10) │ │ └── No Scope Creep (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Code Quality (25%) │ │ ├── Static Analysis (0-10) │ │ ├── Complexity (0-10) │ │ └── Patterns Alignment (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Test Coverage (25%) │ │ ├── Unit Tests Present (0-10) │ │ ├── Test/Code Ratio (0-10) │ │ └── Coverage Percentage (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Contract Fulfillment (20%) │ │ ├── Provides Verified (0-10) │ │ └── Expects Satisfied (0-10) │ └─────────────────────────────────────────────────────────────┘

Category Weights

Category Weight Why

Spec Compliance 30% Most important - did we build what was asked?

Code Quality 25% Technical excellence

Test Coverage 25% Verification and confidence

Contract Fulfillment 20% Integration with other tasks

When to Use

Reading KPI data for task quality evaluation
Understanding quality metrics and scoring breakdown
Deciding whether to iterate or approve based on quantitative data
Integrating KPI checks into automated loops (agents_loop.py )
Generating evidence-based evaluation reports

Instructions

Reading KPI Data (Primary Use)

DO NOT run scripts - read the auto-generated file:

Read the KPI file: docs/specs/001-feature/tasks/TASK-001--kpi.json

Understanding the Data

The KPI file contains:

{ "task_id": "TASK-001", "evaluated_at": "2026-01-15T10:30:00Z", "overall_score": 8.2, "passed_threshold": true, "threshold": 7.5, "kpi_scores": [ { "category": "Spec Compliance", "weight": 30, "score": 8.5, "weighted_score": 2.55, "metrics": { "acceptance_criteria_met": 9.0, "requirements_coverage": 8.0, "no_scope_creep": 8.5 }, "evidence": [ "Acceptance criteria: 9/10 checked", "Requirements coverage: 8/10" ] } ], "recommendations": [ "Code Quality: Moderate improvements possible" ], "summary": "Score: 8.2/10 - PASSED" }

Making Decisions

Use overall_score and passed_threshold :

IF passed_threshold == true: → Task meets quality standards → Approve and proceed

IF passed_threshold == false: → Task needs improvement → Check recommendations for specific targets → Create fix specification

Integration with Workflow

In Task Review (evaluator-agent)

Review Process

Read KPI file: TASK-XXX--kpi.json
Extract overall_score and kpi_scores
Read task file to validate
Generate evaluation report
Decision based on passed_threshold

In agents_loop

Check KPI file exists

kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"

if kpi_path.exists(): kpi_data = json.loads(kpi_path.read_text())

if kpi_data["passed_threshold"]:
    # Quality threshold met
    advance_state("update_done")
else:
    # Need more work
    fix_targets = kpi_data["recommendations"]
    create_fix_task(fix_targets)
    advance_state("fix")

else: # KPI not generated yet - task may not be implemented log_warning("No KPI data found")

Multi-Iteration Loop

Instead of max 3 retries, iterate until quality threshold met:

Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions
Iteration 3: Score 7.8 → PASSED → Proceed

Each iteration updates the KPI file automatically on task save.

Threshold Guidelines

Score Quality Level Action

9.0-10.0 Exceptional Approve, document best practices

8.0-8.9 Good Approve with minor notes

7.0-7.9 Acceptable Approve (if threshold 7.5)

6.0-6.9 Below Standard Request specific improvements

< 6.0 Poor Significant rework required

Recommended Thresholds

Project Type Threshold Rationale

Production MVP 8.0 High quality required

Internal Tool 7.0 Good enough

Prototype 6.0 Functional over perfect

Critical System 8.5 No compromises

Metric Details

Spec Compliance Metrics

Acceptance Criteria Met

Calculates: (checked_criteria / total_criteria) * 10
Source: Task file checkbox count
Example: 9/10 checked = 9.0

Requirements Coverage

Calculates: Count of REQ-IDs this task covers
Source: traceability-matrix.md
Example: 4 requirements covered = 8.0

No Scope Creep

Calculates: (implemented_files / expected_files) * 10
Source: Task "Files to Create" vs actual files
Penalizes: Missing files or unexpected additions

Code Quality Metrics

Static Analysis

Java: Maven Checkstyle
TypeScript: ESLint
Python: ruff
Score: 10 if passes, 5 if issues found

Complexity

Calculates: Functions >50 lines
Score: 10 - (long_functions_ratio * 5)
Penalizes: Large, complex functions

Patterns Alignment

Checks: Knowledge Graph patterns
Source: knowledge-graph.json
Validates: Implementation follows project patterns

Test Coverage Metrics

Unit Tests Present

Calculates: min(10, test_files * 5)
2 test files = maximum score
Penalizes: Missing tests

Test/Code Ratio

Calculates: (test_count / code_count) * 10
1:1 ratio = 10/10
Ideal: At least 1 test file per code file

Coverage Percentage

Source: Coverage reports (JaCoCo, lcov, etc.)
Calculates: coverage_percent / 10
80% coverage = 8.0

Contract Fulfillment Metrics

Provides Verified

Checks: Files exist and export expected symbols
Source: Task provides frontmatter
Validates: Contract satisfied

Expects Satisfied

Checks: Dependencies provide required files/symbols
Source: Task expects frontmatter
Validates: Prerequisites met

When KPI File is Missing

If TASK-XXX--kpi.json doesn't exist:

Task was never modified - Hook runs on file save
Hook failed - Check Claude Code logs
Task is new - Save the file first to trigger hook

DO NOT try to calculate KPIs manually. The hook runs automatically when:

Task file is saved (Write tool)
Task file is edited (Edit tool)

Best Practices

Always Check KPI File Exists

Before evaluating:

Check if KPI file exists: docs/specs/[ID]/tasks/TASK-XXX--kpi.json

If missing:

Task may not be implemented yet
Ask user to save the task file first

Trust the Metrics

The KPIs are objective. Only override with documented evidence:

Critical security issue not in metrics
Logic error not caught by static analysis
Exceptional quality not measured

Iterate on Low KPIs

Target specific categories:

❌ "Fix code quality issues" ✅ "Improve Code Quality KPI from 5.2 to 7.0: - Complexity: Refactor processData() (5→8) - Patterns: Add error handling (6→8)"

Track KPI Trends

Monitor quality over time:

Sprint 1: Average KPI 6.8 Sprint 2: Average KPI 7.3 (+0.5) Sprint 3: Average KPI 7.9 (+0.6)

Troubleshooting

KPI File Not Generated

Check:

Hook enabled in hooks.json
Task file name matches pattern TASK-*.md
File was actually saved (not just viewed)

KPI Scores Seem Wrong

Validate:

Check evidence field for data sources
Verify files exist at expected paths
Some metrics need build tools (Maven, npm)

Low Scores Despite Good Code

Possible causes:

Missing test files
No coverage report generated
Acceptance criteria not checked
Lint rules too strict

Fix the root cause, not just the score.

Examples

Example 1: Reading KPI Data

Read the KPI file to evaluate task quality: docs/specs/001-feature/tasks/TASK-042--kpi.json

Based on the data:

Overall score: 6.8/10 (below threshold)
Lowest KPI: Test Coverage (5.0/10)
Recommendation: Add unit tests

Decision: REQUEST FIXES - target Test Coverage improvement

Example 2: Iteration Decision

Iteration 1 KPI: Score 6.2 → FAILED

Spec Compliance: 7.0 ✓
Code Quality: 5.5 ✗
Test Coverage: 6.0 ✗

Fix targets:

Refactor complex functions (Code Quality)
Add test coverage (Test Coverage)

Iteration 2 KPI: Score 7.8 → PASSED ✓

Example 3: agents_loop Integration

In agents_loop, after implementation step

kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"

if kpi_file.exists(): kpi = json.loads(kpi_file.read_text())

if kpi["passed_threshold"]:
    print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
    advance_state("update_done")
else:
    print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
    print("Recommendations:")
    for rec in kpi["recommendations"]:
        print(f"  - {rec}")
    advance_state("fix")

References

evaluator-agent.md
Agent that uses KPI data for evaluation
hooks.json
Hook configuration for auto-generation
task-kpi-analyzer.py
Hook script (do not execute directly)
agents_loop.py
Orchestrator that reads KPI for decisions

task-quality-kpi

Safety Notice

Copy this and send it to your AI assistant to learn

Review Process

Check KPI file exists

In agents_loop, after implementation step

Source Transparency

Related Skills

shadcn-ui

tailwind-css-patterns

unit-test-bean-validation

react-patterns