Debugging Skill
This skill defines how to systematically debug an issue reported by a developer. It follows a two-phase approach: first Intake (gathering all necessary context), then Diagnosis (structured analysis and actionable output).
Phase 1 — Intake
Before performing any diagnosis, collect the following inputs from the developer. Ask for all of them in a single, structured prompt. Mark optional fields clearly.
Required Inputs
Field Required Notes
1 Problem Description ✅ Yes What went wrong? What was expected vs. actual behaviour?
2 Error Message / Stack Trace ⚡ Highly Recommended Paste the full error. If none, describe what is observed.
3 Steps to Reproduce ⚡ Highly Recommended Exact steps to trigger the issue. Note if it is consistent or intermittent.
4 Environment Context ⚡ Highly Recommended Environment (dev/staging/prod), OS, runtime version, recent deployments or changes
5 Attachments 🔵 Optional Screenshots, logs, network traces, or config files
6 Recent Changes 🔵 Optional Any recent code, config, infra, or data changes that preceded the issue
7 Affected Users / Scope 🔵 Optional All users? A specific role, region, or data set?
Intake Prompt Template
Use this template when asking the developer for inputs:
I'm ready to help debug. Please provide the following:
-
📝 Problem Description (required) What went wrong? What did you expect vs. what actually happened?
-
🔴 Error Message / Stack Trace (highly recommended) Paste the full error output. If no error is shown, describe what you observe instead.
-
🔁 Steps to Reproduce (highly recommended) Walk me through how to trigger this issue. Is it consistent or intermittent?
-
🌐 Environment Context (highly recommended)
- Environment: dev / staging / prod
- OS / Runtime version (e.g., Node 20, Python 3.11, Java 17)
- Any recent deployments, config changes, or dependency updates?
-
📎 Attachments (optional) Screenshots, log files, network traces, or relevant configs.
-
🔧 Recent Changes (optional) Any recent code, database, infra, or data changes before this issue appeared?
-
👥 Affected Scope (optional) Who is affected? All users, specific roles, regions, or certain data?
Problem Description Validation Gate
This gate MUST be evaluated before Phase 2 begins. Do not proceed to Diagnosis if the Problem Description fails validation.
What Makes a Valid Problem Description
A valid Problem Description must satisfy all three of the following criteria:
Criterion Why It Matters
1 Describes a specific observable symptom or failure Vague inputs like "it's broken" or "it doesn't work" provide zero signal for diagnosis
2 Contains at least one concrete detail — a feature name, an action taken, a screen/page, an API endpoint, a value, or a behaviour Without a concrete anchor, there is nothing in the codebase to investigate
3 Is coherent and in good faith — not gibberish, placeholder text, or clearly random input Random text wastes investigation effort and produces meaningless results
Examples
Input Valid? Reason
"The login button does nothing when I click it on the /auth/login page"
✅ Yes Specific symptom + concrete detail (button, page)
"Order creation fails with a 500 error after I added the discount field"
✅ Yes Symptom + context (500, feature name, recent change)
"Something is wrong with the app"
❌ No No specific symptom, no concrete detail
"It's broken"
❌ No Completely vague
"asdfghjkl"
❌ No Gibberish / not in good faith
"help"
❌ No Not a problem description
"I don't know, just debug it"
❌ No No actionable information
"The payment module has an issue"
❌ No No symptom described; too vague to investigate
⚠️ A problem description that identifies a module or file alone (e.g., "the payment module") is not sufficient. It must describe what the module is doing wrong.
Validation Flow
Developer submits intake response │ ▼ Is Problem Description present? │ │ No Yes │ │ ▼ ▼ Re-prompt Does it pass all 3 validity criteria? (Attempt 1) │ │ Yes No │ │ ▼ ▼ Proceed to Re-prompt with Phase 2 specific feedback (Attempt 1 → max 2) │ After 2 failed attempts │ ▼ Halt and escalate
Re-Prompt Rules
-
Maximum re-prompts: 2. If the Problem Description is still invalid after 2 correction attempts, halt the session and send the Escalation Message.
-
Each re-prompt must be specific — tell the developer exactly which criterion their input failed and give a concrete example of what a better input looks like.
-
Do not soften or skip the gate — even if the developer insists, a valid Problem Description is a prerequisite for meaningful diagnosis.
Re-Prompt Template
Use this when the Problem Description is invalid:
⚠️ I wasn't able to start the debug session yet because the Problem Description needs a bit more detail.
What I received: "<their input>"
What's missing: <state which criterion failed — e.g., "no specific symptom described" / "no concrete detail" / "input is not a problem description">
What a good Problem Description looks like:
"The checkout button on /cart throws a 500 error after I enter a promo code." "User profile pictures are not loading for accounts created before 2025-01-01."
Please re-describe the problem with at least one specific symptom and one concrete detail (feature, page, endpoint, value, or behaviour). The other fields (error message, steps to reproduce, etc.) remain unchanged — just update the Problem Description.
Attempt [1 / 2] remaining.
Escalation Message (after 2 failed attempts)
🚫 Debug session could not be started.
After 2 attempts, the Problem Description provided does not contain enough information to proceed with a meaningful diagnosis. Starting an investigation without a clear problem statement risks producing inaccurate or misleading results.
Next steps:
- Gather more context about what is failing — check logs, error messages, or ask a colleague who observed the issue.
- Restart the debug session once you have a clearer description of the symptom.
If you believe this is a false rejection, please provide the full error message or a screenshot — that alone may be sufficient to establish a valid problem context.
Phase 2 — Diagnosis
Once intake is complete, perform a structured investigation and produce the following outputs.
Step 1 — Understand the Bug Behaviour
Classify the bug before diving in:
Dimension Options
Reproducibility Consistent / Intermittent / Unknown
Severity Critical (data loss, outage) / High (feature broken) / Medium (degraded) / Low (cosmetic)
Blast Radius All users / Subset of users / Single user / No user impact (internal/silent)
Bug Type Logic error / Runtime exception / Configuration / Race condition / Data / Integration / Environment
Step 2 — Investigate
Perform the following steps using available tools (code search, file reading, log analysis):
-
Read the error message and stack trace — identify the exact file, line, and function where the error originates
-
Trace the call chain leading to the failure point (use the code-exploration skill's L4 Function level if needed)
-
Search for recent changes in the impacted files (git log , diff review)
-
Check for environmental factors: config values, missing env vars, dependency version mismatches
-
Check data assumptions: is there any data that the code assumes exists, is non-null, or is in a specific format?
-
Identify all other files or services that call into or are called by the failing code — assess the blast radius
Step 3 — Root Cause Analysis
List all plausible root causes with a likelihood score and supporting evidence.
Format
🔎 Root Cause Analysis
| # | Suspected Root Cause | Likelihood | Evidence / Reasoning |
|---|---|---|---|
| 1 | <description of cause> | 🔴 High (75%) | <what points to this> |
| 2 | <description of cause> | 🟡 Medium (20%) | <what points to this> |
| 3 | <description of cause> | 🟢 Low (5%) | <what points to this> |
Most Likely Root Cause: #1 — <brief summary>
Likelihood Legend
Indicator Score Range Meaning
🔴 High 60–100% Strong evidence; the stack trace or code directly supports this
🟡 Medium 30–59% Plausible; partial evidence or requires validation
🟢 Low 1–29% Possible edge case; cannot be ruled out without more data
⚠️ If total likelihood does not add up to 100%, normalise across all candidates. Always rank by descending likelihood. If only one cause is identified, state "single root cause identified" and assign 100%.
Step 4 — Impacted File Map
List every file that is directly or indirectly affected by the bug or the fix.
Format
📁 Impacted Files
| File | Role | Impact Type |
|---|---|---|
src/orders/orders.service.ts | Origin of error | 🔴 Direct — fix required here |
src/users/users.service.ts | Called by failing code | 🟡 Indirect — may need defensive update |
src/common/filters/http-exception.filter.ts | Error handler | 🔵 Review — ensure error surfaces correctly |
tests/orders/orders.service.spec.ts | Unit tests | ✅ Update — add test case for this scenario |
Impact Type Legend
Icon Type Meaning
🔴 Direct Fix required This file contains the defective code
🟡 Indirect May need update Depends on or is called by the defective code
🔵 Review Check only Peripheral; review to confirm it handles this case
✅ Update Test/doc update No logic change, but tests or docs need updating
Step 5 — Proposed Solutions
Provide at least 2 distinct solutions. Solutions must differ meaningfully (not just syntax variants).
Format for Each Solution
Solution [N] — <Short Title>
Approach: <Describe the approach in 2–4 sentences. What changes, and why does this fix the issue?>
Trade-offs:
| Pro | Con |
|---|---|
| <benefit> | <drawback> |
Key Changes:
<file>: <what changes><file>: <what changes>
How to Validate: <Step-by-step instructions to confirm the fix works — unit test to write, curl command to run, UI action to perform, etc.>
Estimated Effort: <Low / Medium / High>
Step 6 — Recommendation
State the recommended solution clearly and explain why it is preferred.
Format
✅ Recommended Solution: Solution [N] — <Title>
Why this is recommended: <2–4 sentences justifying the choice. Reference trade-offs, risk, effort, and alignment with existing patterns.>
Implementation Order:
- <First action to take>
- <Second action to take>
- <Verification step>
Step 7 — Preventive Recommendation
After diagnosing and recommending a fix, always suggest how to prevent this class of bug from recurring.
Format
🛡️ Prevention
| Recommendation | Type | Priority |
|---|---|---|
Add null check for user.id before DB query | Code Guard | 🔴 High |
| Write unit test for empty payload edge case | Test Coverage | 🟡 Medium |
| Add schema validation at API boundary | Input Validation | 🟡 Medium |
| Set up alerting for 5xx error rate spike | Observability | 🔵 Low |
Step 8 — Investigation Trail
Maintain a log of what was checked and what was ruled out. This is critical if the debugging session spans multiple iterations or is handed off to another developer.
Format
🧭 Investigation Trail
| Step | What Was Checked | Finding | Status |
|---|---|---|---|
| 1 | Stack trace origin | Error thrown in orders.service.ts line 47 | ✅ Confirmed |
| 2 | Recent git changes to orders.service.ts | No changes in last 30 days | 🚫 Ruled out |
| 3 | ENV variable DATABASE_URL | Present and correct in all environments | 🚫 Ruled out |
| 4 | Null check on user object before .id access | Missing null guard — matches error pattern | ✅ Confirmed |
| 5 | Upstream caller orders.controller.ts | Passes unvalidated user object | 🟡 Contributing factor |
Full Output Template
Use this skeleton as the final output structure for every debugging session:
🐛 Debug Report — <Short Problem Title>
Reported: YYYY-MM-DD Environment: <dev/staging/prod> Severity: <Critical / High / Medium / Low> Status: <Investigating / Root Cause Identified / Solution Proposed / Resolved>
📋 Problem Summary
<1–3 sentence summary of the issue, expected vs. actual behaviour, and reproducibility>
🔎 Bug Behaviour
- Reproducibility: <Consistent / Intermittent>
- Blast Radius: <Who/what is affected>
- Bug Type: <Logic / Runtime / Config / Race Condition / Data / Integration / Environment>
🔎 Root Cause Analysis
<Section 3 output>
📁 Impacted Files
<Section 4 output>
💡 Proposed Solutions
Solution 1 — <Title>
<Section 5 format>
Solution 2 — <Title>
<Section 5 format>
✅ Recommended Solution
<Section 6 output>
🛡️ Prevention
<Section 7 output>
🧭 Investigation Trail
<Section 8 output>
File Output (MANDATORY)
Every debug report must be saved as a markdown file before presenting it to the user.
Save Location
<repo-root>/.agent/debugs/
Naming Convention
debug_<YYYYMMDD><HHMMSS><short-slug>.md
Examples:
.agent/debugs/debug_20260311_143022_null-user-id-orders.md .agent/debugs/debug_20260311_160045_auth-token-expiry.md
File Header
Every debug file must start with a metadata header:
Debug Report — <Short Problem Title>
Level: Debug Session Reported: YYYY-MM-DD HH:MM:SS Environment: <environment> Severity: <severity> Status: <status>
Quick Decision Guide
Developer reports a bug? → Run Phase 1 Intake (collect all inputs in one prompt)
Intake complete? → Run Phase 2 Diagnosis (Steps 1–8 in order)
Only one root cause found? → Still provide ≥2 solutions (different approaches to fix the same cause)
Bug is intermittent / no stack trace? → Increase weight on environment, race conditions, and data assumptions
Bug appeared "out of nowhere" with no code changes? → Prioritise: infra/config changes, data anomalies, dependency updates
Fix applied but issue persists? → Revisit investigation trail, promote lower-likelihood root causes, add new findings
Session handed off to another developer? → Ensure Investigation Trail (Step 8) is fully up to date before handoff