root-cause-analysis

Analyze telemetry data for root cause analysis using Kopai CLI. Use when debugging errors, investigating latency issues, tracing request flows across services, or correlating logs with traces. Also use when users report production issues like "why is my API slow", "getting 500 errors", "service is down", "requests are timing out", or any symptom that needs telemetry-based investigation — even if they don't mention traces or observability explicitly.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "root-cause-analysis" with this command: npx skills add kopai-app/kopai-mono/kopai-app-kopai-mono-root-cause-analysis

Root Cause Analysis with Kopai

Guide for debugging production issues using telemetry data (traces, logs, metrics) via Kopai CLI.

Prerequisites

Ensure access to Kopai app backend. Make sure the services are set up to send their OpenTelemetry data to Kopai. See otel-instrumentation skill for setup.

RCA Workflow Summary

  1. Find error traces
  2. Get full trace context
  3. Correlate logs with trace
  4. Check related metrics
  5. Identify root cause

Rules

1. Workflow (CRITICAL)

  • workflow-find-errors - Find Error Traces
  • workflow-get-context - Get Full Trace Context
  • workflow-correlate-logs - Correlate Logs with Trace
  • workflow-check-metrics - Check Related Metrics
  • workflow-identify-cause - Identify Root Cause & Present Findings

2. Patterns (HIGH)

  • pattern-http-errors - HTTP Error Debugging
  • pattern-slow-requests - Slow Request Analysis
  • pattern-distributed - Distributed Failure Tracing
  • pattern-log-driven - Log-Driven Investigation

Read rules/<rule-name>.md for details.

Tips

  1. Always use --json for programmatic analysis
  2. Pipe to jq for filtering/aggregation
  3. Start with errors, then trace backwards
  4. Check span Duration to find bottlenecks
  5. Correlate TraceId across traces, logs, metrics
  6. Use --severity-min 17 instead of --severity-text ERROR to catch all error-level logs regardless of text casing. Fall back to --body "error" for errors logged at INFO or with no severity.

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

otel-instrumentation

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

Debug Methodology

Systematic debugging and problem-solving methodology. Activate when encountering unexpected errors, service failures, regression bugs, deployment issues, or...

Registry SourceRecently Updated
5210Profile unavailable
Coding

root-cause-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

root-cause-analysis

No summary provided by upstream source.

Repository SourceNeeds Review