postmortem-generator

Generate blameless incident postmortems from timeline data, alerts, and chat logs. Produce structured reports with root cause analysis, contributing factors, action items, and follow-up tracking — following Google SRE and Etsy blameless postmortem formats.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "postmortem-generator" with this command: npx skills add charlie-morrison/postmortem-generator

Postmortem Generator

Generate blameless postmortems that prevent repeat incidents. Compile timeline from alerts, chat logs, and metrics into a structured report with root cause analysis, contributing factors, and tracked action items — following Google SRE and Etsy blameless formats.

Use when: "write postmortem", "incident review", "blameless review", "what happened during the outage", "incident report", "post-incident review", or after any SEV1/SEV2 incident.

Commands

1. generate — Create Postmortem from Incident Data

Step 1: Gather Timeline Data

# PagerDuty incident timeline
curl -s "https://api.pagerduty.com/incidents/$INCIDENT_ID/log_entries" \
  -H "Authorization: Token token=$PD_TOKEN" | python3 -c "
import json, sys
entries = json.load(sys.stdin)['log_entries']
for e in entries:
    ts = e['created_at'][:19]
    entry_type = e['type']
    summary = e.get('summary', e.get('channel', {}).get('summary', ''))
    print(f'{ts} [{entry_type}] {summary}')
"

# Alert history (Prometheus/Alertmanager)
curl -s "http://alertmanager:9093/api/v2/alerts?filter=incident_id=$INCIDENT_ID" | python3 -c "
import json, sys
alerts = json.load(sys.stdin)
for a in sorted(alerts, key=lambda x: x['startsAt']):
    print(f'{a[\"startsAt\"][:19]} ALERT: {a[\"labels\"][\"alertname\"]} ({a[\"status\"]})')
"

# Git deploys around incident time
git log --since="$INCIDENT_START" --until="$INCIDENT_END" --oneline 2>/dev/null

Step 2: Analyze Root Cause

Use the "5 Whys" technique:

  1. Why did the service go down? → Database connection pool exhausted
  2. Why was the pool exhausted? → Slow queries holding connections
  3. Why were queries slow? → Missing index on new column
  4. Why was the index missing? → Migration didn't include it
  5. Why wasn't this caught? → No query performance tests in CI

Identify:

  • Root cause: The deepest "why" that's actionable
  • Contributing factors: Things that made it worse (no alerting, manual process, missing runbook)
  • Mitigating factors: Things that helped (quick detection, good rollback process)

Step 3: Generate Postmortem Document

# Incident Postmortem: [Title]

**Date:** [YYYY-MM-DD]
**Duration:** [Xh Ym]
**Severity:** SEV-[1/2/3]
**Author:** [Name]
**Status:** Draft / Reviewed / Complete

## Summary
[2-3 sentences: what happened, impact, how resolved]

## Impact
- **Users affected:** [number or percentage]
- **Revenue impact:** [estimated if applicable]
- **Duration:** [from detection to resolution]
- **Services affected:** [list]

## Timeline (all times UTC)
| Time | Event |
|------|-------|
| 14:23 | Deploy of commit abc123 to production |
| 14:31 | Alert: API error rate > 5% |
| 14:33 | On-call acknowledged, began investigation |
| 14:41 | Identified slow database queries |
| 14:45 | Decision: rollback deploy |
| 14:48 | Rollback complete |
| 14:52 | Error rate returned to baseline |
| 14:55 | Confirmed: all systems nominal |

## Root Cause
[Clear explanation of what broke and why, without blame]

## Contributing Factors
- [Factor 1: e.g., no query performance testing in CI]
- [Factor 2: e.g., alert threshold was too high, delayed detection by 8 min]
- [Factor 3: e.g., runbook for DB issues was outdated]

## What Went Well
- Quick detection (8 min from deploy to alert)
- Rollback was smooth (3 min)
- Good communication in incident channel

## What Went Wrong
- No pre-deploy performance check would have caught the missing index
- Alert threshold of 5% was too high — impact started at 1%
- Took 10 min to identify root cause (no slow query dashboard)

## Action Items
| Priority | Action | Owner | Due | Status |
|----------|--------|-------|-----|--------|
| P1 | Add migration linter to CI (check for missing indexes) | @alice | 2026-05-05 | TODO |
| P1 | Lower error rate alert threshold to 1% | @bob | 2026-05-01 | TODO |
| P2 | Add slow query dashboard to Grafana | @carol | 2026-05-10 | TODO |
| P2 | Update DB incident runbook | @dave | 2026-05-07 | TODO |
| P3 | Add query performance tests to staging deploy | @alice | 2026-05-20 | TODO |

## Lessons Learned
[What did we learn that applies beyond this specific incident?]

2. review — Facilitate Blameless Review

Generate review meeting agenda:

  1. Timeline walkthrough (facts only, no blame)
  2. What surprised us?
  3. Where did our assumptions fail?
  4. What would have prevented this?
  5. Action item assignment and prioritization

3. track — Follow Up on Action Items

Check status of postmortem action items:

  • Which action items from recent postmortems are still open?
  • Are we repeating the same root causes? (cluster analysis)
  • Average time to close action items by priority
  • Incidents that could have been prevented by completed action items

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Agentype

Run the Agentype workflow for local AI-agent usage analysis: collect and cache deterministic JSON, infer a persona/archetype from aggregate usage signals, th...

Registry SourceRecently Updated
Research

Amazon Ops Agents

AI-driven multi-agent system for Amazon sellers offering product research, listing optimization, ad management, inventory, pricing, review, brand protection,...

Registry SourceRecently Updated
1320Profile unavailable
Research

Anygen Workflow Generate

AI-powered content creation suite. Create slides/PPT, documents, diagrams, websites, data visualizations, research reports, storybooks, financial analysis, a...

Registry SourceRecently Updated
00Profile unavailable
Research

Deep Research Agent

Comprehensive research agent for in-depth investigation. Use when users ask for deep research, comprehensive analysis, market research, academic surveys, com...

Registry SourceRecently Updated
300Profile unavailable