Exhaustive Systems Analysis
Use this skill for full-system correctness work. The job is to map the system, identify the highest-risk behaviors, prove or refute concrete failure hypotheses, and leave behind a report another engineer can act on without re-reading the whole codebase.
Failure Modes To Prevent
- Surface-level audits that scan files without following behavior end-to-end
- False certainty: reporting suspicions as bugs without enough evidence
- Context drift across large audits with many subsystems
- Cosmetic reviews that miss the real correctness and ship-readiness risks
Operating Mode
- Default to
chat-firstoutput. Return findings inline unless the user asks for docs or the audit clearly needs multi-session artifacts. - Switch to
artifact modefor large or resumable audits. Usedocs/audit/or.claude/docs/audit/, matching the repo's existing conventions. - Do not start fixing code while auditing unless the user explicitly asks for fixes. This skill is for diagnosis, proof, and prioritization.
Workflow
0. Calibrate The Audit
Before reading deeply, write a one-screen scope brief using the template in references/templates.md.
Capture:
- system or area under review
- user-visible workflows or contracts that matter most
- likely high-risk surfaces: state, side effects, concurrency, auth, persistence, external integrations
- out-of-scope areas
- output mode:
chat-firstorartifact mode
If the request is broad, narrow it to the modules that can actually change user outcomes or ship readiness.
1. Load Intent Before Code
Read only the materials that establish intended behavior:
README,CLAUDE.md, architecture docs, ADRs- tests that describe user-visible or contract behavior
- recent commits touching the target area
TODO,FIXME,HACK, and "known issues"- incident notes, bug reports, or issue tracker items if available
Extract:
- critical workflows
- external surfaces
- hotspots and recent churn
- manual-only surfaces that cannot be fully verified from code alone
2. Build The Coverage Ledger
Map the system into subsystems before deep analysis. Use the coverage ledger template in references/templates.md.
For each subsystem record:
- name
- entrypoints
- files or directories in scope
- invariants or promised behaviors
- side effects
- risk level
- status:
planned | in_progress | done | follow_up
Prioritize by user impact first, then by side effects, concurrency, privilege, and recent churn. Folder structure alone is not a priority system.
3. Generate Hypotheses Before The Deep Pass
For each high- or medium-risk subsystem, write 2-3 concrete hypotheses before diving in. Good hypotheses are falsifiable and tied to a behavior boundary.
Examples:
- "A failure between write A and write B can leave persisted state inconsistent."
- "The retry path duplicates a side effect because idempotence is not enforced."
- "The docs promise behavior X, but the implementation falls through to Y on invalid input."
Update or discard hypotheses as evidence comes in. This step prevents aimless scanning.
4. Audit One Subsystem At A Time
Read the subsystem end-to-end:
- start at entrypoints and trace the happy path
- trace error paths, cleanup paths, cancellation or shutdown, and retries
- compare implementation to tests, docs, types, and public contracts
- run targeted searches, commands, or tests when they strengthen the evidence
- record exact commands, searches, and scopes when they support a finding
Select only the relevant checklist sections from references/checklists.md. Do not load every checklist if the subsystem only needs one or two.
When subagents are available, assign one bounded subsystem per subagent with disjoint files and ask for:
- hypotheses checked
- findings with exact citations
- coverage gaps
- suggested next verification step
5. Classify Findings With Evidence, Status, And Confidence
Every finding must separate observation from inference.
Required fields:
Severity:Critical | High | Medium | LowStatus:Confirmed | Likely | Needs follow-upConfidence:High | Medium | LowType:Bug | Race condition | Security | Stale docs | Dead code | Design flaw | ReliabilityLocation: exact file path and line or functionImpacted behavior: the user-visible workflow, invariant, or contract at riskObserved evidence: code citation, command output, test result, log, or search resultInference: why that evidence implies the reported problemWhat I checked: searches, tests, docs, commits, or alternate explanations ruled outRecommendation: the smallest credible next actionNext verification step: required when status isNeeds follow-up
Use Confirmed only when the bug is directly demonstrated by code, a failing test, a repro path, or a hard contradiction. Use Likely when the reasoning is strong but not directly reproduced. Use Needs follow-up when something is suspicious but the evidence is incomplete.
6. Run A Convergence Pass
After subsystem reviews:
- deduplicate cross-cutting findings
- re-rank by severity and user impact
- run a final residue sweep for stale docs, deprecated names, orphaned helpers, temp flags, TODO or FIXME clusters, and risky APIs
- record exact residue queries and counts if they matter to the conclusion
- list coverage gaps explicitly instead of pretending the audit was complete where it was not
Evidence Standard
Prefer stronger evidence over more words. From strongest to weakest:
- failing or targeted test
- reproducible path with exact steps
- direct code contradiction with exact citations
- logs, telemetry, or command output
- scoped search results with counts
- static reasoning
Static reasoning alone can still be valuable, but it should usually produce Likely, not Confirmed.
For dead code or stale docs, always show what you searched and why you believe the code or documentation is obsolete. A dead-code claim without a consumer search is incomplete.
Reporting Rules
- Lead with findings, not the methodology recap.
- Prefer user-impacting correctness issues over stylistic cleanup.
- Keep related findings separate unless they share the same root cause.
- If nothing serious is wrong, say so directly and still report residual risk and unverified surfaces.
- Do not write "looks wrong" or "might be an issue" without saying what you checked and what would prove or disprove it.
Use the templates in references/templates.md for:
- scope brief
- coverage ledger
- finding format
- chat-first summary
- artifact-mode audit directory
Session Management
Use single-session mode for small audits. For large audits or when context is tight, create a lightweight control plane:
00-plan.mdfor the scope brief and coverage ledger- one file per subsystem only if the audit is large enough to justify it
SUMMARY.mdfor consolidated findings and fix orderHANDOFF.mdif work will continue later
A good handoff includes:
- what was covered
- what is now believed to be true
- what remains unverified
- current blockers
- exact next steps
Anti-Patterns
- scanning directories without identifying entrypoints or invariants
- reporting every code smell as a finding
- calling something dead code without a consumer search
- calling something a bug without showing the broken behavior or violated contract
- collapsing multiple subsystems into one giant writeup
- hiding uncertainty instead of marking
Needs follow-up
Completion Criteria
The audit is complete when:
- high-risk subsystems have a ledger entry and a final status
- every reported finding has evidence, confidence, and a concrete location
- the final report includes fix order plus unverified surfaces
- cross-cutting and residue findings have been consolidated
- the report is honest about what was not proven