error-detective

Find and analyze errors across logs and code.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "error-detective" with this command: npx skills add htlin222/dotfiles/htlin222-dotfiles-error-detective

Error Detection

Find and analyze errors across logs and code.

When to use

  • Investigating production errors

  • Analyzing log patterns

  • Finding error root causes

  • Correlating errors across systems

Log analysis

Find errors

Recent errors

grep -i "error|exception|fatal" /var/log/app.log | tail -100

Errors with context

grep -B 5 -A 10 "ERROR" /var/log/app.log

Count by error type

grep -oE "Error: [^:]*" app.log | sort | uniq -c | sort -rn

Errors in time range

awk '/2024-01-15 14:/ && /ERROR/' app.log

Pattern detection

Find repeated errors

grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -rn | head -20

Correlate request IDs

grep "req-12345" *.log | sort -t' ' -k1,2

Find error spikes

grep "ERROR" app.log | cut -d' ' -f1-2 | uniq -c | sort -rn

Stack trace analysis

Parse stack traces

import re

def parse_stack_trace(log_content: str) -> list[dict]: pattern = r'(?P<exception>\w+Error|\w+Exception): (?P<message>.*?)\n(?P<trace>(?:\s+at .+\n)+)'

traces = []
for match in re.finditer(pattern, log_content):
    traces.append({
        'type': match.group('exception'),
        'message': match.group('message'),
        'trace': match.group('trace').strip().split('\n')
    })
return traces

Common patterns

Pattern Indicates Action

NullPointer Missing null check Add validation

Timeout Slow dependency Add timeout, retry

Connection refused Service down Check health, retry

OOM Memory leak Profile, increase limits

Rate limit Too many requests Add backoff, queue

Investigation checklist

  • Capture - Get full error message and stack trace

  • Timestamp - When did it start?

  • Frequency - How often? Increasing?

  • Scope - All users or specific?

  • Changes - Recent deployments?

  • Dependencies - External services affected?

Correlation queries

-- Errors by endpoint SELECT endpoint, count(*) as errors FROM logs WHERE level = 'ERROR' AND time > NOW() - INTERVAL '1 hour' GROUP BY endpoint ORDER BY errors DESC;

-- Error rate over time SELECT date_trunc('minute', time) as minute, count() filter (where level = 'ERROR') as errors, count() as total FROM logs WHERE time > NOW() - INTERVAL '1 hour' GROUP BY minute ORDER BY minute;

Examples

Input: "Find why API is returning 500 errors" Action: Search logs for 500 status, find stack traces, identify root cause

Input: "Analyze error patterns from last hour" Action: Aggregate errors by type, find spikes, correlate with events

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

devops

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code-review

No summary provided by upstream source.

Repository SourceNeeds Review