solo-retro

Post-pipeline retrospective — parse logs, score process quality, find waste patterns, suggest skill/script patches. Use after pipeline completes or when user says "retro", "evaluate pipeline", "what went wrong", "pipeline review", "check pipeline logs".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "solo-retro" with this command: npx skills add fortunto2/solo-factory/fortunto2-solo-factory-solo-retro

/retro

This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.

Post-pipeline retrospective. Parses pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.

Live Context

  • Branch: !git branch --show-current 2>/dev/null
  • Recent commits: !git log --oneline -10 2>/dev/null
  • Modified files: !git diff --name-only HEAD~5..HEAD 2>/dev/null | head -20

When to use

After a pipeline completes (or gets cancelled). This is the process quality check — /review checks code quality, /retro checks pipeline process quality.

Can also be used standalone on any project — with or without pipeline logs.

MCP Tools (use if available)

  • session_search(query) — find past pipeline runs and known issues
  • codegraph_explain(project) — understand project architecture context
  • codegraph_query(query) — query code graph for project metadata

If MCP tools are not available, fall back to Glob + Grep + Read.

Phase 1: Locate Artifacts

  1. Detect project from $ARGUMENTS or CWD:

    • If argument provided: use it as project name
    • Otherwise: extract from CWD basename (e.g., ~/projects/my-app -> my-app)
  2. Find pipeline state file: .solo/pipelines/solo-pipeline-{project}.local.md (project-local) or ~/.solo/pipelines/solo-pipeline-{project}.local.md (global fallback)

    • If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for project_root:
    • If not: pipeline completed — use CWD as project root
  3. Verify artifacts exist (parallel reads):

    • Pipeline log: {project_root}/.solo/pipelines/pipeline.log
    • Iter logs: {project_root}/.solo/pipelines/iter-*.log
    • Progress file: {project_root}/.solo/pipelines/progress.md
    • Plan-done directory: {project_root}/docs/plan-done/
    • Active plan: {project_root}/docs/plan/
  4. Determine analysis mode:

    • If pipeline log exists: proceed with full log-based analysis (Phases 2-4)
    • If NO pipeline log: switch to fallback mode (see Fallback Analysis below)
  5. Count iter logs (if they exist): ls {project_root}/.solo/pipelines/iter-*.log | wc -l

    • Report: "Found {N} iteration logs"

Fallback Analysis (No Pipeline Logs)

If no pipeline logs exist, the retro can still provide value by analyzing:

  1. Git history: git log --oneline --since="1 week ago" — commit frequency, patterns, conventional format
  2. Test results: run test suite if configured in CLAUDE.md or package.json
  3. Build status: run build if configured
  4. CLAUDE.md changes: git log --oneline -- CLAUDE.md — how docs evolved
  5. Code quality metrics: file counts, TODO/FIXME density, dead code indicators
  6. Project structure: completeness of docs/, tests/, CI config

Skip Phases 2-4 and proceed directly to Phase 5 (Plan Fidelity) and Phase 6 (Git & Code Quality). Adjust Phase 7 scoring to weight available data more heavily.

Phase 2: Parse Pipeline Log (quantitative)

Read pipeline.log in full. Parse line-by-line, extracting structured data from log tags:

Log format: [HH:MM:SS] TAG | message

Extract by tag:

TagWhat to extract
STARTPipeline run boundary — count restarts (multiple START lines = restarts)
STAGEiter N/M | stage S/T: {stage_id} — iteration count per stage
SIGNAL<solo:done/> or <solo:redo/> — which stages got completion signals
INVOKESkill invoked — extract skill name, check for wrong names
ITERcommit: {sha} | result: {stage complete|continuing} — per-iteration outcome
CHECK{stage} | {path} -> FOUND|NOT FOUND — marker file checks
FINISHDuration: {N}m — total duration per run
MAXITERReached max iterations ({N}) — hit iteration ceiling
QUEUEPlan cycling events (activating, archiving)
CIRCUITCircuit breaker triggered (if present)
CWDWorking directory changes
CTRLControl signals (pause/stop/skip)

Compute metrics:

total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"

per_stage = {
  stage_id: {
    attempts: count of STAGE lines for this stage,
    successes: count of ITER lines with "stage complete" for this stage,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}

Phase 3: Parse Progress.md (qualitative)

Read progress.md and scan for error patterns:

  1. Unknown skill errors: grep for Unknown skill: — extract which skill name was wrong
  2. Empty iterations: iterations where "Last 5 lines" show only errors or session header (no actual work done)
  3. Repeated errors: same error appearing in consecutive iterations -> spin-loop indicator
  4. Doubled signals: <solo:done/><solo:done/> in same iteration -> minor noise (note but don't penalize)
  5. Redo loops: count how many times build->review->redo->build cycles occurred

For each error pattern found, record:

  • Pattern name
  • First occurrence (iteration number)
  • Total occurrences
  • Consecutive streak (max)

Phase 4: Analyze Iter Logs (sample-based)

Do NOT read all iter logs — could be 60+. Use smart sampling:

  1. First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it

    • Strip ANSI codes when reading: sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
  2. First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log

    • Look for <solo:done/> in the output
  3. Final review iter: Read the last iter-*-review.log (the verdict)

  4. Extract from each sampled log:

    • Tools called (count of tool_use blocks)
    • Errors encountered (grep for Error, error, Unknown, failed)
    • Signal output (<solo:done/> or <solo:redo/> present?)
    • First 5 and last 10 meaningful lines (skip blank lines)

Phase 5: Plan Fidelity Check

For each track directory in docs/plan-done/ and docs/plan/:

  1. Read spec.md (if exists):

    • Count acceptance criteria: total - [ ] and - [x] checkboxes
    • Calculate: criteria_met = checked / total * 100
  2. Read plan.md (if exists):

    • Count tasks: total - [ ] and - [x] checkboxes
    • Count phases (## headers)
    • Check for SHA annotations (<!-- sha:... -->)
    • Calculate: tasks_done = checked / total * 100
  3. Compile per-track summary:

    • Track ID, criteria met %, tasks done %, has SHAs

Phase 6: Git & Code Quality (lightweight)

Quick checks only — NOT a full /review:

  1. Commit count and format:

    git -C {project_root} log --oneline | wc -l
    git -C {project_root} log --oneline | head -30
    
    • Count commits with conventional format (feat:, fix:, chore:, test:, docs:, refactor:, build:, ci:, perf:)
    • Calculate: conventional_pct = conventional / total * 100
  2. Committer breakdown:

    git -C {project_root} shortlog -sn --no-merges | head -10
    
  3. Test status (if test command exists in CLAUDE.md or package.json):

    • Run test suite, capture pass/fail count
    • If no test command found, skip and note "no tests configured"
  4. Build status (if build command exists):

    • Run build, capture success/fail
    • If no build command found, skip and note "no build configured"

Phase 6.5: Context Degradation Analysis

Check for signs of context window problems during the pipeline run:

  1. Iteration quality curve: Compare early iterations vs late iterations.

    • Did error rates increase over time? (sign of context degradation)
    • Did the agent start repeating itself or losing track of the plan?
  2. Observation masking usage: Check if scratch/ directory exists in project root.

    • If yes: good — agent was offloading large outputs
    • If no but iter logs show >100-line tool outputs: flag as waste source
  3. Plan recitation evidence: In sampled iter logs, check if the agent re-read plan.md at task boundaries.

    • Absent recitation + task drift = context engineering gap
  4. CLAUDE.md bloat: wc -c {project_root}/CLAUDE.md

    • 40,000 chars: WARN — attention dilution likely

    • 60,000 chars: RED — severe context budget pressure

Add findings to the report under ## Context Health:

## Context Health
- Iteration quality trend: {STABLE / DEGRADING / N/A}
- Observation masking: {USED / NOT USED / N/A}
- Plan recitation: {OBSERVED / ABSENT / N/A}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}

Phase 7: Score & Report

Load scoring rubric from ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md. If plugin root not available, use the embedded weights:

Scoring weights:

  • Efficiency (waste %): 25%
  • Stability (restarts): 20%
  • Fidelity (criteria met): 20%
  • Quality (test pass rate): 15%
  • Commits (conventional %): 5%
  • Docs (plan staleness): 5%
  • Signals (clean signals): 5%
  • Speed (total duration): 5%

Note: In fallback mode (no pipeline logs), redistribute Efficiency and Stability weights to Fidelity, Quality, and Commits.

Generate report at {project_root}/docs/retro/{date}-retro.md:

# Pipeline Retro: {project} ({date})

## Overall Score: {N}/10

## Pipeline Efficiency

| Metric | Value | Rating |
|--------|-------|--------|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |

## Per-Stage Breakdown

| Stage | Attempts | Successes | Waste % | Notes |
|-------|----------|-----------|---------|-------|
| scaffold | | | | |
| setup | | | | |
| plan | | | | |
| build | | | | |
| deploy | | | | |
| review | | | | |

## Failure Patterns

### Pattern 1: {name}
- **Occurrences:** {N} iterations
- **Root cause:** {analysis}
- **Wasted:** {N} iterations
- **Fix:** {concrete suggestion with file reference}

### Pattern 2: ...

## Plan Fidelity

| Track | Criteria Met | Tasks Done | SHAs | Rating |
|-------|-------------|------------|------|--------|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |

## Code Quality (Quick)

- **Tests:** {N} pass, {N} fail (or "not configured")
- **Build:** PASS / FAIL (or "not configured")
- **Commits:** {N} total, {pct}% conventional format

## Three-Axis Growth

| Axis | Score | Evidence |
|------|-------|----------|
| **Technical** (code, tools, architecture) | {0-10} | {what changed} |
| **Cognitive** (understanding, strategy, decisions) | {0-10} | {what improved} |
| **Process** (harness, skills, pipeline, docs) | {0-10} | {what evolved} |

If only one axis is served — note what's missing.

## Recommendations

1. **[CRITICAL]** {patch suggestion with file:line reference}
2. **[HIGH]** {improvement}
3. **[MEDIUM]** {optimization}
4. **[LOW]** {nice-to-have}

## Suggested Patches

### Patch 1: {file} — {description}

**What:** {one-line description}
**Why:** {root cause reference from Failure Patterns}

\```diff
- old line
+ new line
\```

Rating guide (use these emojis):

  • GREEN = excellent
  • YELLOW = acceptable
  • RED = needs attention

Phase 8: Interactive Patching

After generating the report:

  1. Show summary to user: overall score, top 3 failure patterns, top 3 recommendations

  2. For each suggested patch (if any), use AskUserQuestion:

    • Question: "Apply patch to {file}? {one-line description}"
    • Options: "Apply" / "Skip" / "Show diff first"
  3. If "Show diff first": display the full diff, then ask again (Apply / Skip)

  4. If "Apply": use Edit tool to apply the change directly

  5. After all patches processed:

    • If any patches were applied: suggest committing with fix(retro): {description}
    • Do NOT auto-commit — just suggest the command

Phase 9: CLAUDE.md Revision

After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.

Steps:

  1. Read CLAUDE.md and check size: wc -c CLAUDE.md
  2. Add learnings from this retro:
    • Pipeline failure patterns worth remembering (avoid next time)
    • New workflow rules or process improvements
    • Updated commands or tooling changes
    • Architecture decisions that emerged during the pipeline run
  3. If over 40,000 characters — trim ruthlessly:
    • Collapse completed phase/milestone histories into one line each
    • Remove verbose explanations — keep terse, actionable notes
    • Remove duplicate info (same thing explained in multiple sections)
    • Remove historical migration notes, old debugging context
    • Remove examples that are obvious from code or covered by skill/doc files
    • Remove outdated troubleshooting for resolved issues
  4. Verify result <= 40,000 characters — if still over, cut least actionable content
  5. Write updated CLAUDE.md, update "Last updated" date

Priority (keep -> cut):

  1. ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
  2. KEEP: Workflow instructions, troubleshooting for active issues, key file references
  3. CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
  4. CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues

Rules:

  • Never remove Do/Don't sections — critical guardrails
  • Preserve overall section structure and ordering
  • Every line must earn its place: "would a future agent need this to do their job?"
  • Commit the update: git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"

Phase 10: Factory Critic (optional)

Run this phase only if ${CLAUDE_PLUGIN_ROOT} is available (i.e., solo-factory is installed). Skip if running as a standalone skill without the factory context.

After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.

What to evaluate:

  1. Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):

    • For each skill: ${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
    • Did the skill have the right instructions for this project's needs?
    • Did it miss context it should have had?
  2. Read pipeline script signal handling and stage logic:

    • ${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
    • Were there structural issues (wrong stage order, missing re-exec, broken redo)?
  3. Cross-reference with failure patterns from Phase 3:

    • For each failure: was the root cause in the skill, the script, or the project?
    • Skills that caused waste = factory defects

Score the factory (not the project):

Factory Score: {N}/10

Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}

Pipeline reliability: {N}/10 — {why}

Missing capabilities:
- {what the factory couldn't do that it should have}

Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}

Harness Evolution — think about the bigger picture

After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:

  1. Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?

    • Missing docs -> add to docs/ or CLAUDE.md
    • Stale docs -> flag for doc-gardening
    • Knowledge only in your head -> encode it
  2. Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?

    • Repeated boundary violations -> need a linter or structural test
    • Inconsistent patterns -> need golden principle in CLAUDE.md
    • Data shape errors -> need parse-at-boundary enforcement
  3. Decision traces: What worked well that future agents should reuse? What failed that they should avoid?

    • Good patterns -> capture as precedent in docs or CLAUDE.md
    • Bad patterns -> encode as anti-pattern or lint rule
    • Think: "if another agent hits this same problem tomorrow, what should it find?"
  4. Skill gaps: Which skills need better instructions? Which new skills should exist?

    • Skill that caused waste -> concrete SKILL.md patch
    • Missing capability -> new skill idea for evolution log

Write to evolution log:

Append findings to {project_root}/docs/evolution.md (create if not exists). If ~/.solo/evolution.md exists, append there as well for cross-project tracking.

## {YYYY-MM-DD} | {project} | Factory Score: {N}/10

Pipeline: {stages run} | Iters: {total} | Waste: {pct}%

### Defects
- **{severity}** | {skill/script}: {description}
  - Fix: {concrete file:change}

### Harness Gaps
- **Context:** {what knowledge was missing or stale for the agent}
- **Constraints:** {what boundary violations or inconsistencies occurred}
- **Precedents:** {patterns worth capturing for future agents — good or bad}

### Missing
- {capability the factory lacked}

### What worked well
- {skill/pattern that performed efficiently}

Rules:

  • Be brutally honest — if a skill is broken, say so
  • Every defect must have a concrete fix (file + what to change)
  • Track what works well too — don't regress good patterns
  • Keep entries compact — this file accumulates over time

Signal Output

Output signal: <solo:done/>

Important: /retro always outputs <solo:done/> — it never needs redo. Even if pipeline was terrible, the retro itself always completes.

Edge Cases

  • No pipeline.log and no git history: abort with clear message — "No pipeline log or git history found. Nothing to analyze."
  • No pipeline.log but git history exists: switch to fallback mode (see Fallback Analysis)
  • Empty pipeline.log: report "Pipeline log is empty — was the pipeline cancelled before any iteration?"
  • No iter logs: skip Phase 4 sampling, note in report
  • No plan-done: skip Phase 5, note "No completed plans found"
  • No test/build commands: skip those checks in Phase 6, note in report
  • Pipeline still running: warn user — "State file exists, pipeline may still be running. Retro on partial data."

Reference Files

  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md — scoring rubric (8 axes, weights)
  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md — known failure patterns and fixes

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

solo-swarm

No summary provided by upstream source.

Repository SourceNeeds Review
General

solo-build

No summary provided by upstream source.

Repository SourceNeeds Review
General

solo-scaffold

No summary provided by upstream source.

Repository SourceNeeds Review
General

solo-setup

No summary provided by upstream source.

Repository SourceNeeds Review