User request: $ARGUMENTS

Analyze a Claude Code session to identify what went well and what could be improved, then suggest high-confidence fixes to skills in this repository.

Input formats:

Session ID (UUID): 184078b7-2609-46e0-a1f2-bb42367a8d34
Session file path: ~/.claude/projects/.../session-id.jsonl
Inline commentary: Text description of what happened

Output: High-confidence issues only with evidence-based suggestions for skill improvements.

Signal quality bar: Only recommend changes that would have prevented specific rework in the session. A fix is high-signal when ALL of:

You can point to exact message numbers where rework occurred
The skill change would have triggered BEFORE that rework
Following the change would have produced correct output initially

Definition - high-signal fix: A skill change that passes the 3/3 counterfactual test (see Phase 5.2).

Phase 1: Parse Input & Setup

1.1 Identify input type

Input Pattern Type Action

UUID format (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ) Session ID Find and read session file

Path ending in .jsonl

Session file Read directly

Other text Commentary Analyze inline, may reference sessions

1.2 Locate session file (if session ID)

Session files are stored at:

~/.claude/projects/{project-path-encoded}/{session-id}.jsonl

Note: {project-path-encoded} replaces path separators with URL-safe encoding (e.g., /home/user/myproject becomes -home-user-myproject ). Don't rely on exact path structure—use find instead.

Use Bash to find:

find ~/.claude/projects -name "{session-id}" -type f 2>/dev/null

If file not found: Ask user to provide the session file path directly or check if session ID is correct.

1.3 Create analysis log

Path: /tmp/session-analysis-{session-id-short}-{timestamp}.md

Purpose: External memory that persists findings beyond LLM working memory. Write to this file IMMEDIATELY after each discovery—never batch multiple findings into one write.

Session Analysis Log

Session: {id or "inline commentary"} Started: {timestamp} Status: IN_PROGRESS

Session Overview

Initial request: Skills invoked: Outcome: Session length:

Pattern Detection

Iterations Found

User Corrections Found

Workflow Deviations Found

Missing Questions Found

Post-Implementation Fixes Found

Skill Comparison

Skills Discovered

Skill: {name}

Potential Issues

Counterfactual Analysis

Final Recommendations

1.4 Create todo list

CRITICAL: Write to log IMMEDIATELY after each finding—never batch writes.

Setup: Create log file, parse session, write overview
Pattern detection: iterations, corrections, deviations, missing questions, post-impl fixes (write each to log)
Skill discovery: extract skills, locate files, write to log
(expand: "Analyze {skill}" for each skill found)
Refresh context: read FULL analysis log
Counterfactual analysis: test each issue, write recommendations
Output final report

Expansion: When skills are discovered, add one todo per skill:

Analyze {skill-name} skill + write findings to log

Why write-after-each-step matters: By synthesis, early findings suffer context rot. Writing externalizes findings to a file that persists. The refresh step moves ALL findings to context end (highest attention zone).

Phase 2: Parse Session

2.1 Session file structure

Claude Code sessions are JSONL files with these record types:

Type Contains

user

User messages, message.content field

assistant

Claude responses, tool calls, thinking

system

System events, commands, hooks

file-history-snapshot

File state tracking

2.2 Extract key events

Use jq to parse:

User messages

cat {session-file} | jq -r 'select(.type == "user") | .message.content' 2>/dev/null

Tool calls

cat {session-file} | jq -r 'select(.type == "assistant") | .message.content | if type == "array" then .[] | select(.type == "tool_use") | .name else empty end' 2>/dev/null

Skill invocations

grep -o '"skill":"[^"]*"' {session-file} | sort | uniq -c

2.3 Build session overview

Extract and log:

Initial request: First user message (the goal)
Workflow used: Which skills invoked (/spec , /plan , /implement , etc.)
Workflow skipped: Skills that would typically apply but weren't invoked (see table below)
Outcome: Success, partial, or required rework
Session length: Message count, duration if available

Expected skills by task type (use to detect skipped workflows):

Task Indicators in Request Expected Skills

"build", "implement", "create feature", "add" spec → plan → implement

"fix bug", "debug", "not working" bugfix

"review", "check", "audit" review (or specific review-*)

"refactor", "improve", "optimize" plan → implement

Multi-file changes (3+ files likely) plan before implement

Only flag as "skipped" if evidence suggests the skill would have prevented issues that occurred.

Phase 3: Pattern Detection

Analyze the session for these patterns. Each pattern has evidence requirements.

3.1 Iteration patterns (things that didn't work first time)

Evidence required: Same file edited multiple times, OR error → fix → retry sequence

Look for:

TypeScript errors followed by fixes
Test failures followed by code changes
Lint errors followed by formatting changes
Same function/file edited more than once with different intent (not additive changes, but corrections)

Log format:

Iteration: {description}

Files affected: {list}
Attempts: {count}
Root cause: {why it didn't work first time}
Potential skill gap: {what could have prevented this}

Write ALL iteration findings to log file NOW, before proceeding to 3.2.

3.2 User corrections ("no, I meant...")

Evidence required: User message containing correction language

Correction indicators:

"no", "not what I meant", "actually", "instead", "I meant"
"let's go back", "undo", "revert"
"that's wrong", "incorrect"

Log format:

User Correction: {what was corrected}

Original action: {what Claude did}
User feedback: {correction text}
Missing context: {what Claude should have asked/known}

Write ALL correction findings to log file NOW, before proceeding to 3.3.

3.3 Workflow deviations

Evidence required: Expected workflow step skipped or out-of-order

Check for:

Multi-phase skill invoked but phases skipped (read skill to know expected phases)
Skill with prerequisites invoked without those prerequisites (e.g., implementation without planning)
Ordered steps executed out of order
Verification/validation steps skipped before proceeding

How to detect: Compare skill's documented phases against actual session sequence.

Log format:

Workflow Deviation: {what was skipped/reordered}

Skill: {which skill's workflow}
Expected flow: {phases from skill definition}
Actual flow: {what happened in session}
Impact: {did this cause issues later?}

Write ALL deviation findings to log file NOW, before proceeding to 3.4.

3.4 Missing questions

Evidence required: Information discovered during implementation that should have been asked upfront

Look for:

Design decisions made mid-implementation
Assumptions that were later corrected
"The user confirmed..." appearing late in session
Post-implementation "actually, let's change..." patterns

Log format:

Missing Question: {what should have been asked}

Discovered at: {phase where it came up}
Impact: {rework required}
Skill gap: {which skill should have asked this}

Write ALL missing question findings to log file NOW, before proceeding to 3.5.

3.5 Post-implementation fixes

Evidence required: Changes made AFTER "implementation complete" or PR creation

Look for:

Commits/changes after PR URL appears
Refactoring after "done" or "complete" messages
Review findings that required code changes
User requesting changes after seeing "finished"

Log format:

Post-Implementation Fix: {what was fixed}

Original implementation: {what was done}
Fix required: {what changed}
Should have been caught by: {which phase/skill}

Write ALL post-implementation findings to log file NOW, before proceeding to Phase 4.

Phase 4: Skill Comparison

4.1 Discover skills used in session

Step 1: Extract skill invocations from session:

Find all Skill tool invocations (handles both "skill" and skill names in tool calls)

grep -oE '"skill"\s*:\s*"[^"]*"' {session-file} | sort | uniq -c

Find slash command patterns in user messages (alphanumeric with hyphens)

grep -oE '/[a-zA-Z][a-zA-Z0-9-]*' {session-file} | sort | uniq -c

Find Skill tool calls with plugin:skill format

grep -oE 'Skill\s*(\s*"[^"]+:[^"]+"' {session-file} | sort | uniq -c

Step 2: Find ALL relevant files for each skill.

Option A - If codebase-explorer skill is available:

Invoke the vibe-workflow:explore-codebase skill with: "medium - find all files related to the {skill-name} skill: SKILL.md definition, related agents it spawns, hooks that affect it, and any shared utilities"

Option B - Manual search fallback:

Find skill definition

find . -path "*/skills/{skill-name}/SKILL.md" -type f

Find related agents (look in same plugin's agents/ folder)

PLUGIN_DIR=$(dirname $(dirname {skill-path})) ls "$PLUGIN_DIR/agents/" 2>/dev/null

Find hooks that might affect this skill

grep -r "{skill-name}" --include="*.py" */hooks/ 2>/dev/null

This discovers:

The SKILL.md definition itself
Agents the skill spawns (e.g., plan-verifier for /plan)
Hooks that intercept skill behavior (e.g., Stop hook)
Shared utilities the skill depends on
Test files that document expected behavior

Step 3: Log discovered skills with full context:

Skills Used in Session

{skill-name}

SKILL.md: {path}
Related agents: {list from explorer}
Related hooks: {list from explorer}
Invoked: {count} times

Write discovered skills to log file NOW, before proceeding to 4.2.

4.2 Extract actionable rules from each skill

For each skill file, extract:

Rule indicators (look for these patterns):

must , should , never , always → mandatory behaviors
Phase N:

or ### Step N: → workflow phases

questions: or AskUserQuestion → required user prompts
| Condition | Action | tables → decision rules
CRITICAL , IMPORTANT → high-priority rules
- todo templates → expected workflow steps
Acceptance: or Validation: → verification requirements

Extract and log:

Skill: {name}

File: {path}

Mandatory behaviors:

{rule with line number}

Workflow phases:

{phase name} - expected outputs: {list}

Required questions (when applicable):

{question topic}

Verification steps:

{what should be checked}

4.3 Compare documented vs actual

For each skill used in the session:

Aspect Documented Actual Gap? Impact

Questions asked {from skill} {from session} {Y/N} {would have prevented X}

Phases followed {from skill} {from session} {Y/N} {would have caught Y}

Validations run {from skill} {from session} {Y/N} {would have avoided Z}

Outputs produced {from skill} {from session} {Y/N} {required later but missing}

Key comparison questions:

Did the skill ask all documented questions? If not, did missing answers cause issues?
Were all phases executed in order? If skipped, did it matter?
Were validations run? If skipped, did bugs slip through?
Did outputs match documented format? If not, did downstream steps suffer?

4.4 Log skill gaps with impact

Skill Gap: {skill name}

Rule violated: {what skill says to do, with line reference}
What happened: {actual behavior in session}
Evidence: {specific session content showing gap}
Impact: {did this cause iteration/correction/rework?}
Counterfactual: {if rule was followed, would outcome differ?}
Confidence: HIGH | MEDIUM | LOW

Only log gaps where:

Evidence is clear (specific session content)
Impact is documented (caused measurable problem)
Counterfactual is plausible (fix would have helped)

Write skill gap findings to log file after analyzing EACH skill—never batch multiple skills into one write.

Phase 5: Synthesize Recommendations

5.1 Refresh context (non-negotiable)

CRITICAL: Use the Read tool to read the FULL analysis log file NOW before any synthesis.

Why this step exists: By this point, findings from Phase 3 (pattern detection) have suffered context rot—they're in the "lost middle" of conversation. The log file contains ALL findings written throughout this workflow. Reading it:

Moves ALL findings to context END (highest attention zone)
Converts holistic synthesis (LLMs are bad at this) into dense recent context (LLMs are good at this)
Restores details that would otherwise be missed

Action: Use Read("/tmp/session-analysis-{id}-{timestamp}.md") to read the entire log file. Do NOT proceed to 5.2 until this is complete.

5.2 Counterfactual analysis (the high-signal filter)

For each potential issue, answer these three questions with specific evidence:

Test Question Evidence Required

T1: Rework identified Can you cite specific message numbers where rework occurred? Message #X shows error, #Y shows fix

T2: Intervention point At which earlier message would the skill change have triggered? Message #Z is where skill phase X runs

T3: Trigger conditions met Did the session content at that point contain info that would activate the change? Quote the text that would trigger it

Scoring:

Score Criteria Action

3/3 All three answered with specific evidence HIGH confidence → Include

2/3 One test lacks specific evidence MEDIUM confidence → Deferred section

1/3 or 0/3 Multiple tests lack evidence LOW confidence → Discard

Example counterfactual:

Issue: Plan skill should ask about time filtering

T1 (Rework): Message #47 adds 90-day filter, #48 user says "yes that's what I meant" T2 (Intervention): Message #12 is plan phase, "Files to modify" section T3 (Trigger): Message #5 user wrote "recent refunds" - this would trigger temporal scope question

Score: 3/3 → HIGH confidence

5.3 Additional disqualifiers

Even with 3/3 counterfactual score, discard if:

Compliance failure: Skill already documents this; it just wasn't followed
One-off context: Unusual situation unlikely to recur (e.g., user typo)
Scope creep: Fix would make skill too complex for marginal benefit
Side effects: Fix would break other documented behavior

5.4 Format recommendations

For each HIGH confidence issue:

Issue: {short title}

Confidence: HIGH Counterfactual score: 3/3

Evidence

{Quote or describe specific session content}

What Happened

{Brief description of the problem}

Root Cause

{Why the current skill didn't prevent this - be specific about what's missing}

Counterfactual

Iteration avoided: {which rework/correction would NOT have happened}
Intervention point: {exact moment the fix would have triggered}
Trigger condition: {why the fix would have applied to this session}

Suggested Fix

File: {skill file path} Section: {phase/section name} Line: ~{approximate line number}

Current behavior: {What the skill does now}

Proposed behavior: {What the skill should do instead}

{code block showing the diff if applicable}

Risk Assessment

Side effects: {could this break other flows? NO/LOW/MEDIUM/HIGH}
Complexity added: {minimal/moderate/significant}
Test approach: {how to verify the fix works}

Phase 6: Output

6.1 Final report structure

Session Analysis: {session id or description}

Summary

Session outcome: {success/partial/rework needed}
Workflow used: {skills invoked}
Iterations observed: {count of retry/fix cycles}
High-signal fixes: {count} (3/3 counterfactual score)

What Went Well

{List of things that worked correctly - be specific about which skill behaviors succeeded}

Iteration Timeline

{Chronological list of corrections/retries observed, with timestamps if available}

{time}: {what was attempted}
{time}: {what went wrong}
{time}: {how it was fixed}

High-Confidence Improvements

Fix 1: {title}

{Full format from 5.4}

Fix 2: {title}

...

Deferred (Partial Counterfactual Match)

{Issues that scored 2/3 - might help but not definitively causal}

Issue	Missing Criterion	Why Deferred
{title}	{which of the 3 failed}	{brief explanation}

Not Actionable

{Issues that scored 1/3 or 0/3, or hit disqualifiers - briefly note for transparency}

Analysis log: {log file path}

6.2 Return report

Output the final report. User can then decide which fixes to implement.

Key Principles

Principle Rule

Counterfactual-driven Every fix must pass the 3/3 counterfactual test

Evidence-based Quote or cite specific session content

Iteration-focused Primary signal = things that required retry/correction

Skill-focused Goal is improving skills, not critiquing user or session

Log-driven Write findings to log as you go, refresh before synthesis

Confidence Criteria (Counterfactual-Based)

Score Test Results Action

3/3 Would avoid iteration + Name exact moment + Conditions would trigger HIGH → Include

2/3 Two of three MEDIUM → Deferred section

1/3 One of three LOW → Not Actionable section

0/3 None Discard entirely

Disqualifiers (Even at 3/3)

Disqualifier Why It Filters Out

Compliance failure Skill already covers this - LLM didn't follow

One-off context Unusual situation unlikely to recur

Scope creep Fix adds complexity disproportionate to benefit

Side effects Fix would break other documented behaviors

Error Handling

Error Response

Session file not found Ask user to provide path directly

Malformed JSONL (parse error) Skip malformed lines, note in report

Skill file not found Note as "skill definition unavailable" in comparison

jq not available Use grep/sed fallback patterns to extract JSON fields

Empty session Report "Session contains no analyzable content"

Never Do

Suggest changes without specific session evidence
Include fixes that score <3/3 in main recommendations
Skip the counterfactual test ("it seems like it would help")
Skip reading the full analysis log before synthesis
Critique user behavior (focus on skill gaps, not user mistakes)
Suggest fixes that would break other documented behavior
Recommend changes to skills that weren't used in the session

learn-from-session

Safety Notice

Copy this and send it to your AI assistant to learn

Session Analysis Log

Session Overview

Pattern Detection

Iterations Found

User Corrections Found

Workflow Deviations Found

Missing Questions Found

Post-Implementation Fixes Found

Skill Comparison

Skills Discovered

Skill: {name}

Potential Issues

Counterfactual Analysis

Final Recommendations

User messages

Tool calls

Skill invocations

Iteration: {description}

User Correction: {what was corrected}

Workflow Deviation: {what was skipped/reordered}

Missing Question: {what should have been asked}

Post-Implementation Fix: {what was fixed}

Find all Skill tool invocations (handles both "skill" and skill names in tool calls)

Find slash command patterns in user messages (alphanumeric with hyphens)

Find Skill tool calls with plugin:skill format

Find skill definition

Find related agents (look in same plugin's agents/ folder)

Find hooks that might affect this skill

Skills Used in Session

{skill-name}

Phase N:

Skill: {name}

Skill Gap: {skill name}

Issue: {short title}

Evidence

What Happened

Root Cause

Counterfactual

Suggested Fix

Risk Assessment

Session Analysis: {session id or description}

Summary

What Went Well

Iteration Timeline

High-Confidence Improvements

Fix 1: {title}

Fix 2: {title}

Deferred (Partial Counterfactual Match)

Not Actionable

Source Transparency

Related Skills

scrollytelling

explore-codebase

review

decide