root-cause-analysis

Symptom → hypothesis formation → evidence gathering → elimination → root cause → verified fix.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "root-cause-analysis" with this command: npx skills add outfitter-dev/agents/outfitter-dev-agents-root-cause-analysis

Root Cause Analysis

Symptom → hypothesis formation → evidence gathering → elimination → root cause → verified fix.

<when_to_use>

  • Diagnosing system failures or unexpected behavior

  • Investigating incidents or outages

  • Finding the actual cause vs surface symptoms

  • Preventing recurrence through understanding

  • Any situation where "why did this happen?" needs answering

NOT for: known issues with documented fixes, simple configuration errors, guessing without evidence

</when_to_use>

<discovery_phase>

Core Questions

Question Why it matters

What's the symptom? Exact manifestation of the problem

When did it start? First occurrence, patterns in timing

Can you reproduce it? Consistently, intermittently, specific conditions

What changed recently? Deployments, config, dependencies, environment

What have you tried? Previous fix attempts, their results

What are the constraints? Time budget, what can't be modified

Confidence Thresholds

Level State Action

0-2 Symptom unclear or can't reproduce Keep gathering info

3 Good context, some gaps Can start hypothesis phase

4+ Clear picture Proceed to investigation

At level 3+, transition to hypothesis formation. Below level 3, keep gathering context.

</discovery_phase>

<hypothesis_formation>

Quality Criteria

Good Hypothesis Weak Hypothesis

Testable Too broad ("something's wrong")

Falsifiable Untestable

Specific Contradicts evidence

Plausible Assumes conclusion

Multiple Working Hypotheses

Generate 2-4 competing theories:

  • List each hypothesis with supporting/contradicting evidence

  • Rank by likelihood (evidence support, parsimony, testability)

  • Design tests to differentiate between them

</hypothesis_formation>

<evidence_gathering>

Observation Collection

Category What to Gather

Error manifestation Exact symptoms, messages, states

Reproduction steps Minimal sequence triggering issue

System state Logs, variables, config at failure time

Environment Versions, platform, dependencies

Timing When started, frequency, patterns

Breadcrumb Analysis

Trace backwards from symptom:

  • Last known good state — what was working?

  • First observable failure — when did it break?

  • Changes between — what's different?

  • Root trigger — first thing that went wrong

</evidence_gathering>

<hypothesis_testing>

Test Design

For each hypothesis:

  • Prediction — if true, what should we observe?

  • Test method — how to verify?

  • Expected result — what confirms/refutes?

  • Time budget — when to move on?

Testing Priorities

Priority Strategy

First Quick, non-destructive, local tests

Second Most likely causes, common failures

Third Edge cases, rare failures

Execution Loop

Baseline → Single variable change → Observe → Document → Iterate

</hypothesis_testing>

<elimination_methodology>

Three core techniques:

Technique When to Use

Binary Search Large problem space, ordered changes

Variable Isolation Multiple variables, need causation

Process of Elimination Finite set of possible causes

See elimination-techniques.md for detailed methods.

</elimination_methodology>

<time_boxing>

Phase Duration Exit Condition

Discovery 5-10 min Questions answered, can reproduce

Hypothesis 10-15 min 2-4 testable theories ranked

Testing 15-30 min per hypothesis Confirmed or ruled out

Fix Variable Root cause addressed

Verification 10-15 min Fix confirmed, prevention documented

If stuck beyond 2x estimate → step back, seek fresh perspective, or escalate.

</time_boxing>

<audit_trail>

Log every step:

[TIME] PHASE: Action → Result [10:15] DISCOVERY: Gathered error logs → Found NullPointerException [10:22] HYPOTHESIS: User object not initialized [10:28] TEST: Added null check logging → Confirmed user is null

Benefits: Prevents revisiting same ground, enables handoff, catches circular investigation.

See documentation-templates.md for full templates.

</audit_trail>

<common_pitfalls>

Watch for these patterns:

Trap Counter

"I already looked at that" Re-examine with fresh evidence

"That can't be the issue" Test anyway, let evidence decide

"We need to fix this quickly" Methodical investigation is faster

Confirmation bias Actively seek disconfirming evidence

Correlation = causation Test direct causal mechanism

See pitfalls.md for detailed resistance patterns and recovery.

</common_pitfalls>

<confidence_calibration>

Level Indicators

High Consistent reproduction, clear cause-effect, multiple confirmations, fix verified

Moderate Reproduces mostly, strong correlation, single confirmation

Low Inconsistent reproduction, unclear correlation, unverified hypothesis

</confidence_calibration>

ALWAYS:

  • Gather sufficient context before hypothesizing

  • Form multiple competing hypotheses

  • Test systematically, one variable at a time

  • Document investigation trail

  • Verify fix actually addresses root cause

  • Document for future prevention

NEVER:

  • Jump to solutions without diagnosis

  • Trust single hypothesis without testing alternatives

  • Apply fixes without understanding cause

  • Skip verification of fix

  • Repeat same failed investigation steps

  • Hide uncertainty about root cause

Deep-dive documentation:

  • elimination-techniques.md — binary search, variable isolation, process of elimination

  • pitfalls.md — cognitive biases and resistance patterns

  • documentation-templates.md — investigation logs and RCA reports

Related skills:

  • debugging-and-diagnosis — code-specific debugging (loads this skill)

  • codebase-analysis — uses for code investigation

  • report-findings — presenting investigation results

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

codebase-recon

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

graphite-stacks

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code-review

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

hono-dev

No summary provided by upstream source.

Repository SourceNeeds Review