/review - Comprehensive Code Review

Description

Full-spectrum code review using parallel specialist sub-agents. Each specialist analyzes a specific dimension, findings are aggregated, and /qa-gate produces the final gate decision.

Key Features:

Parallel specialist sub-agents for thorough analysis
Requirements traceability (AC → tests mapping)
Active refactoring when safe
Technical debt identification
Holistic findings aggregation
Automatic gate decision via /qa-gate

Usage

Review a single story by number

/review 3.1.5

Review a single story by file path

/review docs/stories/epic-6-wishlist/wish-2002-add-item-flow.md

Review current branch (no story)

/review --branch

Review all stories in an epic directory (shorthand)

/review epic-6-wishlist

Review all stories in a directory (full path)

/review docs/stories/epic-6-wishlist/

Review directory, only stories with specific status

/review epic-6-wishlist --status=Draft

Quick review (skip deep specialists)

/review 3.1.5 --quick

Review with auto-fix enabled

/review 3.1.5 --fix

Review specific files only

/review --files src/auth/**/*.ts

Skip gate decision (findings only)

/review 3.1.5 --no-gate

Parameters

target - Story number (e.g., 3.1.5 ), story file path, epic directory name (e.g., epic-6-wishlist ), or full directory path
--branch - Review current branch without story reference
--status - Filter stories by status (e.g., Draft , In Progress , Approved ) - only used with directory review
--quick - Run only required checks, skip deep specialists
--fix - Auto-fix issues when safe (refactoring)
--files - Review specific files only
--no-gate - Skip /qa-gate call, return findings only

EXECUTION INSTRUCTIONS

CRITICAL: Use Task tool to spawn parallel sub-agents. Use TodoWrite to track progress.

Phase 0: Initialize & Determine Mode

Auto-detect operation mode based on target argument:

Detection logic:

if target is None or target == "--branch": mode = "branch_review" elif target.endswith(".md"): mode = "single_story" # File path to specific story elif is_directory(target) or target.startswith("epic-"): mode = "directory_review" elif is_story_number(target): # e.g., "3.1.5" or "2002" mode = "single_story" else: # Try to resolve as story number, fall back to directory mode = "single_story"

Mode A: Single Story Review (default)

Triggered by:

Story number (e.g., 3.1.5 , 2002 )
Story file path (e.g., docs/stories/epic-6-wishlist/wish-2002-add-item-flow.md )
--branch flag

Mode B: Directory Review (new)

Triggered by:

Epic directory name (e.g., epic-6-wishlist ) - auto-prepends docs/stories/
Full directory path (e.g., docs/stories/epic-6-wishlist/ )
Any path that resolves to a directory containing .md files

TodoWrite([ { content: "Scan directory for stories", status: "in_progress", activeForm: "Scanning directory" }, { content: "Filter stories by status", status: "pending", activeForm: "Filtering stories" }, { content: "Review each story sequentially", status: "pending", activeForm: "Reviewing stories" }, { content: "Generate summary report", status: "pending", activeForm: "Generating summary" } ])

Directory scanning:

Resolve directory path:
If starts with epic- : prepend docs/stories/ → docs/stories/epic-6-wishlist/
If full path provided: use as-is
If relative path: resolve from working directory
Verify directory exists, error if not
Find all .md files in directory: Glob(pattern: "*.md", path: {DIR_PATH})
Filter out excluded files/directories:
Skip files in _legacy/ subdirectories
Skip IMPLEMENTATION_ORDER.md
Skip README.md
Skip any file starting with EPIC- (epic definition files)
For each remaining file, read and extract:
Frontmatter (if present)
Status field (from frontmatter or status: line in file)
If --status filter provided, only include stories where status matches
Sort stories by filename (natural sort order)
Create todo list with one item per story

Proceed to Phase 0A for single story or Phase 0B for directory.

Phase 0A: Gather Context (Single Story)

For single story review:

TodoWrite([ { content: "Gather review context", status: "in_progress", activeForm: "Gathering context" }, { content: "Run required checks", status: "pending", activeForm: "Running checks" }, { content: "Spawn specialist sub-agents", status: "pending", activeForm: "Spawning specialists" }, { content: "Aggregate findings", status: "pending", activeForm: "Aggregating findings" }, { content: "Run qa-gate", status: "pending", activeForm: "Running qa-gate" }, { content: "Update story file", status: "pending", activeForm: "Updating story" } ])

Gather context:

If story provided, read story file and extract:
Acceptance criteria
Tasks list
File list (if present)
Previous QA results
Get list of changed files: git diff --name-only origin/main
Read CLAUDE.md for project guidelines
Determine review scope (files to analyze)

Risk assessment (determines review depth): Auto-escalate to deep review if:

Auth/payment/security files touched
No tests added
Diff > 500 lines
Previous gate was FAIL or CONCERNS
Story has > 5 acceptance criteria

Proceed to Phase 1.

Phase 0B: Process Directory (Multiple Stories)

For directory review:

stories_to_review = [{story_path}, {story_path}, ...] # From Phase 0

TodoWrite([ { content: "Review {story_1}", status: "in_progress", activeForm: "Reviewing {story_1}" }, { content: "Review {story_2}", status: "pending", activeForm: "Reviewing {story_2}" }, { content: "Review {story_3}", status: "pending", activeForm: "Reviewing {story_3}" },

... one per story

{ content: "Generate summary report", status: "pending", activeForm: "Generating summary" } ])

For each story in stories_to_review:

Read story file and extract:

Story ID/number
Title
Status
Acceptance criteria
Tasks list
Previous review findings (if any)

Run Phases 0A through 7 for this story (see below for modified Phase 7)

Mark todo as completed, move to next story

After all stories processed, proceed to Phase 8B (Summary Report)

CRITICAL: Process stories sequentially, not in parallel. This allows findings to be appended to each story file before moving to the next.

Proceed to Phase 0A for each story, then continue through phases.

Phase 1: Required Checks

Run these first (blocking if they fail):

pnpm test --filter='...[origin/main]' pnpm check-types --filter='...[origin/main]' pnpm lint --filter='...[origin/main]'

If any fail and --fix is set:

Try to auto-fix lint issues: pnpm lint --fix
Re-run checks

If still failing: Report and continue (will affect gate decision)

Phase 2: Spawn Specialist Sub-Agents

CRITICAL: Spawn all specialists in parallel using run_in_background: true

2.1 Requirements Traceability Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Requirements traceability", run_in_background: true, prompt: "You are a requirements traceability specialist.

       Story file: {STORY_FILE_PATH}
       Changed files: {CHANGED_FILES}

       For each acceptance criterion in the story:
       1. Find the test(s) that validate it
       2. Document the mapping using Given-When-Then format
       3. Identify any AC without test coverage

       T-SHIRT SIZE ESTIMATE (from requirements perspective):
       Based on the number of acceptance criteria, their complexity, and test coverage gaps:
       - XS: 1-2 simple ACs, all covered
       - S: 3-4 ACs, mostly covered
       - M: 5-7 ACs, some gaps
       - L: 8-10 ACs, significant gaps
       - XL: 11+ ACs, many gaps
       - XXL: 15+ ACs, extensive gaps or highly complex criteria

       Output format:
       ```yaml
       traceability:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         covered:
           - ac: 1
             test_file: src/__tests__/auth.test.ts
             test_name: 'should validate login credentials'
             given_when_then: 'Given valid credentials, When login called, Then returns token'
         gaps:
           - ac: 3
             description: 'No test for session timeout handling'
             severity: medium
             suggested_test: 'Add test for session expiry behavior'
       ```"

)

2.2 Code Quality Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Code quality review", run_in_background: true, prompt: "You are a code quality specialist.

       Project guidelines: {CLAUDE_MD_CONTENT}
       Changed files: {CHANGED_FILES}

       Analyze for:
       1. Architecture and design patterns
       2. Code duplication
       3. Refactoring opportunities
       4. Best practices adherence
       5. CLAUDE.md compliance (Zod schemas, @repo/ui, @repo/logger, no barrel files)

       T-SHIRT SIZE ESTIMATE (from code quality/complexity perspective):
       Based on code complexity, architectural impact, and refactoring needs:
       - XS: Simple, isolated changes, minimal complexity
       - S: Straightforward implementation, few files touched
       - M: Moderate complexity, some architectural considerations
       - L: Complex logic, multiple components/patterns involved
       - XL: Significant architectural changes, cross-cutting concerns
       - XXL: Major refactoring or system-wide impact

       For each finding:
       - id: QUAL-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File path
       - line: Line number
       - suggested_action: How to fix
       - can_auto_fix: true|false

       Output format:
       ```yaml
       code_quality:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         complexity_score: 1-10
         findings:
           - id: QUAL-001
             severity: medium
             finding: '...'
             # ... rest of fields
       ```"

)

2.3 Security Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Security review", run_in_background: true, prompt: "You are a security specialist.

       Changed files: {CHANGED_FILES}

       Check for:
       - Authentication/authorization issues
       - Injection vulnerabilities (SQL, XSS, command)
       - Sensitive data exposure
       - OWASP Top 10 issues
       - Hardcoded secrets or credentials
       - Insecure dependencies

       T-SHIRT SIZE ESTIMATE (from security risk perspective):
       Based on security surface area, risk level, and required mitigations:
       - XS: No security implications, read-only operations
       - S: Minor security considerations, standard auth checks
       - M: Moderate security requirements, data validation needed
       - L: Significant security impact, auth/authz critical
       - XL: High-risk features (payments, PII, admin functions)
       - XXL: Critical security features requiring extensive hardening

       For each finding:
       - id: SEC-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File path
       - line: Line number
       - cwe: CWE reference if applicable
       - suggested_action: How to fix

       Output format:
       ```yaml
       security:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         risk_level: low|medium|high|critical
         findings:
           - id: SEC-001
             severity: high
             finding: '...'
             # ... rest of fields
       ```"

)

2.4 Performance Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Performance review", run_in_background: true, prompt: "You are a performance specialist.

       Changed files: {CHANGED_FILES}

       Check for:
       - N+1 query patterns
       - Missing database indexes
       - Unnecessary re-renders in React
       - Large bundle imports
       - Missing memoization (useMemo, useCallback, React.memo)
       - Inefficient algorithms
       - Memory leaks

       T-SHIRT SIZE ESTIMATE (from performance optimization perspective):
       Based on performance complexity and optimization requirements:
       - XS: Minimal performance considerations, simple operations
       - S: Basic performance best practices sufficient
       - M: Some optimization needed (memoization, indexes)
       - L: Significant performance work (query optimization, caching)
       - XL: Complex performance requirements (real-time, large datasets)
       - XXL: Critical performance engineering (sub-second SLAs, scale challenges)

       For each finding:
       - id: PERF-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File path
       - estimated_impact: Description of performance impact
       - suggested_action: How to fix

       Output format:
       ```yaml
       performance:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         performance_risk: low|medium|high
         findings:
           - id: PERF-001
             severity: medium
             finding: '...'
             # ... rest of fields
       ```"

)

2.5 Accessibility Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Accessibility review", run_in_background: true, prompt: "You are an accessibility specialist.

       Changed files: {CHANGED_FILES}

       Check for:
       - WCAG 2.1 AA compliance
       - Keyboard navigation support
       - Screen reader compatibility
       - Missing ARIA labels/roles
       - Color contrast issues
       - Focus management
       - Form labels and error messages

       T-SHIRT SIZE ESTIMATE (from accessibility perspective):
       Based on UI complexity and a11y requirements:
       - XS: No UI changes, or simple read-only content
       - S: Basic UI with standard components (buttons, text)
       - M: Interactive UI (forms, modals) requiring a11y attention
       - L: Complex UI (data tables, multi-step flows, drag-drop)
       - XL: Rich interactions (custom widgets, animations, dynamic content)
       - XXL: Highly complex UI requiring extensive a11y engineering

       For each finding:
       - id: A11Y-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File path
       - wcag_criterion: WCAG reference (e.g., 1.4.3)
       - suggested_action: How to fix

       Output format:
       ```yaml
       accessibility:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         ui_complexity: low|medium|high
         findings:
           - id: A11Y-001
             severity: high
             finding: '...'
             # ... rest of fields
       ```"

)

2.6 Test Coverage Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Test coverage analysis", run_in_background: true, prompt: "You are a test coverage specialist.

       Changed files: {CHANGED_FILES}
       Test files: {TEST_FILES}

       Analyze:
       1. Test coverage for changed code
       2. Test quality and maintainability
       3. Edge cases and error scenarios
       4. Mock/stub appropriateness
       5. Test level appropriateness (unit vs integration vs e2e)

       T-SHIRT SIZE ESTIMATE (from testing perspective):
       Based on testing complexity and coverage requirements:
       - XS: Minimal testing needed, simple assertions
       - S: Basic unit tests sufficient (1-2 test files)
       - M: Moderate testing (unit + some integration tests)
       - L: Comprehensive testing (unit + integration + edge cases)
       - XL: Extensive testing (e2e, multiple scenarios, complex mocks)
       - XXL: Critical path requiring exhaustive test coverage

       For each finding:
       - id: TEST-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File being tested (or missing tests)
       - suggested_action: What tests to add/improve

       Output format:
       ```yaml
       test_coverage:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         test_complexity: low|medium|high
         findings:
           - id: TEST-001
             severity: medium
             finding: '...'
             # ... rest of fields
       ```"

)

2.7 Technical Debt Specialist

Task( subagent_type: "general-purpose", model: "haiku", description: "Technical debt assessment", run_in_background: true, prompt: "You are a technical debt specialist.

       Changed files: {CHANGED_FILES}

       Identify:
       1. Accumulated shortcuts or TODOs
       2. Missing tests
       3. Outdated patterns or dependencies
       4. Architecture violations
       5. Code that should be refactored
       6. Documentation gaps

       T-SHIRT SIZE ESTIMATE (from tech debt/maintenance perspective):
       Based on long-term maintenance burden and tech debt:
       - XS: Clean implementation, no debt added
       - S: Minimal debt, well-documented
       - M: Some shortcuts taken, minor debt
       - L: Notable tech debt or maintenance concerns
       - XL: Significant debt that will require future cleanup
       - XXL: Major tech debt or legacy pattern perpetuation

       For each finding:
       - id: DEBT-{NNN}
       - severity: low|medium|high
       - finding: Description
       - file: File path
       - estimated_effort: small|medium|large
       - suggested_action: How to address

       Output format:
       ```yaml
       technical_debt:
         t_shirt_size: XS|S|M|L|XL|XXL
         size_rationale: 'Brief explanation of size estimate'
         maintenance_burden: low|medium|high
         findings:
           - id: DEBT-001
             severity: low
             finding: '...'
             # ... rest of fields
       ```"

)

Phase 3: Collect Results

Wait for all specialists to complete:

results = { traceability: TaskOutput(task_id: "{traceability_id}"), code_quality: TaskOutput(task_id: "{quality_id}"), security: TaskOutput(task_id: "{security_id}"), performance: TaskOutput(task_id: "{performance_id}"), accessibility: TaskOutput(task_id: "{accessibility_id}"), test_coverage: TaskOutput(task_id: "{coverage_id}"), technical_debt: TaskOutput(task_id: "{debt_id}") }

Phase 4: Aggregate Findings

Combine all findings into unified structure:

review_summary: story: "{STORY_NUM}" reviewed_at: "{ISO-8601}" files_analyzed: {count}

t_shirt_sizing: recommended_size: M # Synthesized from all specialists confidence: high|medium|low specialist_estimates: requirements: { size: M, rationale: "..." } code_quality: { size: L, rationale: "..." } security: { size: S, rationale: "..." } performance: { size: M, rationale: "..." } accessibility: { size: M, rationale: "..." } test_coverage: { size: L, rationale: "..." } technical_debt: { size: M, rationale: "..." } size_breakdown: XS: 0 specialists S: 1 specialist M: 4 specialists L: 2 specialists XL: 0 specialists XXL: 0 specialists synthesis_rationale: | Final size recommendation based on: - Modal size: M (4/7 specialists) - Outliers: Code Quality (L), Test Coverage (L) due to [reason] - Risk factors: [key considerations] - Recommended: M with awareness of testing complexity

checks: tests: { status: PASS|FAIL } types: { status: PASS|FAIL } lint: { status: PASS|FAIL }

findings: total: {count} by_severity: high: {count} medium: {count} low: {count} by_category: security: {count} performance: {count} accessibility: {count} code_quality: {count} test_coverage: {count} technical_debt: {count} requirements: {count}

traceability: ac_total: {count} ac_covered: {count} ac_gaps: {count}

all_findings: - id: SEC-001 category: security severity: high finding: "..." file: "..." suggested_action: "..." # ... all findings sorted by severity

Synthesize T-Shirt Size:

Collect all specialist estimates:

Extract t_shirt_size from each specialist's output
Extract rationale for each estimate

Calculate modal size (most common):

Count occurrences of each size
Identify the most frequent size

Identify outliers:

Flag estimates more than 1 size away from modal
Document why specialists disagree

Apply weighting logic:

Security/Requirements: Higher weight if XL or XXL (risk-based)
Code Quality: Higher weight if architectural complexity
Test Coverage: Consider for effort estimation
Technical Debt: Lower weight unless XXL

Synthesize final recommendation:

Start with modal size
Adjust up if high-risk outliers (Security XL/XXL)
Adjust up if multiple specialists cite complexity
Set confidence based on agreement level:
High: 5+ specialists agree
Medium: 3-4 specialists agree
Low: Wide spread, no clear modal

Document synthesis rationale:

Explain final size choice
Call out key risk factors
Note any caveats or warnings

Deduplicate findings:

Merge similar findings from different specialists
Keep highest severity when duplicated

Phase 5: Auto-Fix (if --fix enabled)

For findings with can_auto_fix: true:

Task( subagent_type: "general-purpose", description: "Apply safe refactoring", prompt: "Apply these safe fixes:

       {FIXABLE_FINDINGS}

       Project guidelines: {CLAUDE_MD_CONTENT}

       For each fix:
       1. Make the change
       2. Run tests to verify
       3. Commit with message: 'refactor: {description}'

       Report what was fixed and what was skipped."

)

Re-run required checks after fixes.

Phase 6: Run QA Gate

Unless --no-gate specified:

Invoke /qa-gate skill with:

Story number (if provided)
Aggregated findings
Check results

The /qa-gate skill will:

Determine gate decision (PASS/CONCERNS/FAIL)
Create gate file at docs/qa/gates/{story}-{slug}.yml
Return gate status

Phase 7: Update Story File

Append Review Findings section to story file:

Check if story file already has a ## Review Findings section:

If yes: Replace it with updated findings
If no: Append to end of file

Review Findings

Review Date: {ISO-8601} Reviewed By: Claude Code Gate: {PASS|CONCERNS|FAIL} (score: {score}/100) Gate File: docs/qa/gates/{story}-{slug}.yml

T-Shirt Size Estimate

Recommended Size: {M} (Confidence: {high|medium|low})

Specialist Breakdown:

Specialist	Size	Rationale
Requirements	M	5-7 ACs with some test gaps
Code Quality	L	Moderate complexity, architectural considerations
Security	S	Standard auth checks, low risk
Performance	M	Some optimization needed (memoization)
Accessibility	M	Interactive UI requiring a11y attention
Test Coverage	L	Comprehensive testing needed (unit + integration)
Tech Debt	M	Minor shortcuts, well-documented

Size Distribution: XS: 0, S: 1, M: 4, L: 2, XL: 0, XXL: 0

Synthesis: Modal size is M (4/7 specialists). Code Quality and Test Coverage flagged as L due to architectural complexity and comprehensive testing requirements. Overall recommendation: M with awareness that testing effort may push toward upper end of estimate.

Summary

Files Analyzed: {count}
Total Findings: {count} (high: {N}, medium: {N}, low: {N})
Traceability: {N}/{M} acceptance criteria have test coverage

Required Checks

Check	Status
Tests	{PASS/FAIL}
Types	{PASS/FAIL}
Lint	{PASS/FAIL}

Requirements Traceability

{If traceability gaps found:}

[REQ-001] {severity}: {finding}
- File: {file or N/A}
- Action: {suggested_action}

{If no gaps:} ✓ All acceptance criteria have test coverage