Deco Incident Debugging
PRIORITY: SPEED - This skill is designed for real-time incident response. Every second counts. Execute the triage workflow immediately and propose solutions as fast as possible.
When to Use This Skill
-
Production incident in progress
-
Site is down, slow, or showing errors
-
Customer escalation requiring immediate response
-
On-call engineer needs AI assistance during war room
-
Post-incident to document root cause and learnings
Quick Start - 60 Second Triage
- GET SYMPTOMS → What error/behavior is the user seeing?
- MATCH LEARNINGS → Search learnings/ for similar patterns
- PROPOSE SOLUTIONS → If match found, apply known fix
- DEEP DIVE → If no match, run diagnostic workflow
- DOCUMENT → Capture new learning if novel issue
Files in This Skill
File Purpose
SKILL.md
This overview and quick reference
triage-workflow.md
Step-by-step fast triage process (interactive)
headless-mode.md
Autonomous investigation without human interaction
learnings-index.md
Categorized index of all past learnings
Operating Modes
Interactive Mode (default)
Use when working alongside an engineer during an incident. The agent asks clarifying questions and collaborates on diagnosis.
Headless Mode
Use when triggered automatically by incident management systems (PagerDuty, Opsgenie, etc.). The agent receives minimal context and autonomously:
-
Extracts keywords from alert message
-
Searches learnings for matching patterns
-
Collects live data (logs, metrics)
-
Correlates findings to known issues
-
Outputs structured diagnosis with proposed fix
See headless-mode.md for full autonomous workflow, input/output formats, and integration examples.
Learnings Database
We have documented learnings from past incidents in learnings/ . These contain:
-
Problem: What went wrong
-
Symptoms: Observable indicators
-
Root Cause: Why it happened (with code examples)
-
Solution: How to fix it (with code examples)
-
How to Debug: Commands and techniques to diagnose
-
Impact: What happens if unfixed
Current Learnings Categories
Category Learnings Key Issues
cache-strategy
2 Missing cache, cookie blocking edge cache
loader-optimization
2 N+1 overfetching, lazy section issues
external-css
1 Lazy sections missing styles
block-config
1 Dangling block references
rich-text
1 Hardcoded domain URLs
ui-bug
1 Invisible clickable areas
responsive
1 Breakpoint inconsistencies
retry-logic
1 Off-by-one retry attempts
safari-bug
1 Image flash on navigation
vtex-routing
1 myvtex vs vtexcommercestable domain
migration
1 Deno 1 to Deno 2 migration
Incident Response Workflow
Phase 1: Rapid Assessment (< 2 minutes)
Ask the engineer these questions:
What is the error message or behavior?
-
Copy exact error text if available
-
Describe what user sees
When did it start?
-
Sudden vs gradual
-
After deployment?
-
Traffic spike?
What is the scope?
-
All users or specific segment?
-
All pages or specific routes?
-
One site or multiple?
What changed recently?
-
Code deployments
-
Config changes
-
Third-party updates
Phase 2: Pattern Matching (< 3 minutes)
Search learnings for known patterns:
Search by symptom keywords
grep -ri "SYMPTOM_KEYWORD" learnings/
Examples:
grep -ri "429" learnings/ # Rate limiting grep -ri "cache" learnings/ # Cache issues grep -ri "slow" learnings/ # Performance grep -ri "not found" learnings/ # Missing resources grep -ri "cookie" learnings/ # Cookie/session issues grep -ri "vtex" learnings/ # VTEX integration grep -ri "lazy" learnings/ # Lazy loading issues
Phase 3: Known Issue? Apply Fix Immediately
If symptom matches a learning:
-
Read the full learning file
-
Verify root cause matches the current symptoms
-
Apply the documented solution
-
Verify the fix works
Phase 4: Unknown Issue? Deep Diagnostic
If no learning matches, run full diagnostics:
Check error logs (HyperDX)
SEARCH_LOGS({ query: "level:error site:SITENAME", limit: 50 })
Check CDN metrics (if slow)
MONITOR_SUMMARY({ sitename: "SITE", hostname: "HOSTNAME" }) MONITOR_TOP_PATHS({ ... }) MONITOR_STATUS_CODES({ ... })
Check for TypeScript errors
deno check --unstable-tsgo --allow-import main.ts
Check for recent changes
git log --oneline -20 git diff HEAD~5
Check block validation
deno run -A https://deco.cx/validate -report validation-report.json
Phase 5: Document New Learning
If this is a novel issue, create a new learning:
[Title]
Category
[category-name]
Problem
[What went wrong]
Symptoms
- [Observable indicator 1]
- [Observable indicator 2]
Root Cause
[Why it happened with code examples]
Solution
[How to fix with code examples]
How to Debug
[Commands and techniques]
Files Affected
[List of file patterns]
Pattern Name
[Short memorable name]
Checklist Item
[One-line check for future audits]
Impact
[What happens if unfixed]
Common Incident Patterns
Rate Limiting (429 Errors)
Symptoms: "Too Many Requests" errors, spiky error rates
Quick Check:
grep -ri "rate limit|429|overfetch" learnings/
Likely Learnings:
-
loader-overfetching-n-plus-problem.md
-
Fetching too much data
-
cache-strategy-standardization-loaders.md
-
Missing cache causing repeated calls
Immediate Actions:
-
Check if loaders have export const cache
-
Check for N+1 query patterns
-
Add stale-while-revalidate caching
Slow Page Load
Symptoms: High TTFB, slow LCP, user complaints about speed
Quick Check:
grep -ri "slow|cache|lazy|performance" learnings/
Likely Learnings:
-
cache-strategy-standardization-loaders.md
-
Missing cache
-
lazy-sections-external-css-loading.md
-
CSS not loading for lazy sections
-
vtex-cookies-prevent-edge-caching.md
-
Edge cache blocked
Immediate Actions:
-
Check cache hit rates with CDN metrics
-
Verify lazy sections have proper cache headers
-
Check for sync loaders blocking render
Missing Content / Blank Sections
Symptoms: Sections not rendering, blank areas on page
Quick Check:
grep -ri "dangling|missing|not found|blank" learnings/
Likely Learnings:
-
dangling-block-references.md
-
Block config points to deleted component
-
duplicate-sections-masked-by-broken-loaders.md
-
Loader errors hidden by duplicates
Immediate Actions:
-
Run block validation: deno run -A https://deco.cx/validate
-
Check browser console for loader errors
-
Verify component files exist
VTEX Integration Issues
Symptoms: Products not loading, cart errors, checkout problems
Quick Check:
grep -ri "vtex" learnings/
Likely Learnings:
-
vtex-domain-routing-myvtex-vs-vtexcommercestable.md
-
Wrong domain
-
vtex-cookies-prevent-edge-caching.md
-
Cookie issues
Immediate Actions:
-
Check VTEX domain configuration
-
Verify API credentials
-
Check for VTEX service status
Visual Bugs / UI Issues
Symptoms: Broken layouts, invisible elements, style issues
Quick Check:
grep -ri "invisible|css|style|responsive|safari" learnings/
Likely Learnings:
-
invisible-clickable-areas-from-empty-links.md
-
Empty links covering content
-
responsive-breakpoint-consistency.md
-
Mobile/desktop inconsistencies
-
safari-image-flash-fix.md
-
Safari-specific image issues
-
lazy-sections-external-css-loading.md
-
Missing styles
Immediate Actions:
-
Check browser dev tools for overlapping elements
-
Inspect CSS loading order
-
Test on affected browser/device
Debugging Commands Reference
Error Investigation
Search error logs
SEARCH_LOGS({ query: "level:error site:SITENAME", limit: 50 })
Group errors by message
GET_LOG_DETAILS({ query: "level:error site:SITENAME", groupBy: ["body", "service"] })
Timeline of errors
QUERY_CHART_DATA({ series: [{ dataSource: "events", aggFn: "count", where: "level:error" }], granularity: "1 hour" })
Performance Investigation
CDN summary
MONITOR_SUMMARY({ sitename: "SITE", hostname: "HOST", startDate: "YYYY-MM-DD", endDate: "YYYY-MM-DD" })
Top paths by requests
MONITOR_TOP_PATHS({ ... })
Cache effectiveness
MONITOR_CACHE_STATUS({ ... })
Error rates
MONITOR_STATUS_CODES({ ... })
Code Investigation
Type errors
deno check --unstable-tsgo --allow-import main.ts
Block validation
deno run -A https://deco.cx/validate
Find missing cache
grep -L "export const cache" loaders/**/*.ts
Recent changes
git log --oneline -20 git diff HEAD~5
Find files with errors
deno check --unstable-tsgo --allow-import main.ts 2>&1 | grep "error:" | sed 's/:.*//g' | sort | uniq -c | sort -rn
Escalation Criteria
Escalate Immediately If:
-
Site completely down (no pages loading)
-
Checkout broken (revenue impact)
-
Data breach suspected
-
Issue affecting multiple customers
-
Fix requires platform changes (not site code)
Can Handle with This Skill:
-
Single site performance issues
-
Configuration problems
-
Code bugs in site repository
-
Cache/loader issues
-
Visual/UI bugs
Post-Incident Checklist
After resolving the incident:
-
Document root cause in learnings/ if novel
-
Create PR with fix
-
Update affected checklists if pattern is common
-
Share learning with team
-
Consider if monitoring should be added
Related Skills
-
deco-full-analysis
-
For comprehensive site audits (non-urgent)
-
deco-performance-audit
-
For deep performance analysis
-
deco-typescript-fixes
-
For systematic type error fixes