check-production

Audit production health. Output findings as structured report.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "check-production" with this command: npx skills add phrazzld/claude-config/phrazzld-claude-config-check-production

/check-production

Audit production health. Output findings as structured report.

What This Does

  • Query Sentry for unresolved issues

  • Check Vercel logs for recent errors

  • Test health endpoints

  • Check GitHub Actions for CI/CD failures

  • Output prioritized findings (P0-P3)

This is a primitive. It only investigates and reports. Use /log-production-issues to create GitHub issues or /triage to fix.

Process

  1. Sentry Check

Run triage script if available

~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"

Or spawn Sentry MCP query if configured.

  1. Vercel Logs Check

Check for recent errors

~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50

  1. Health Endpoints

Test health endpoint

~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .

  1. GitHub CI/CD Check

Check for failed workflow runs on default branch

gh run list --branch main --status failure --limit 5 2>/dev/null ||
gh run list --branch master --status failure --limit 5 2>/dev/null

Get details on most recent failure

gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null

Check for stale/stuck workflows

gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null

What to look for:

  • Failed runs on main/master branch (broken CI)

  • Failed runs on feature branches blocking PRs

  • Stuck/in-progress runs that should have completed

  • Patterns in failure types (tests, lint, build, deploy)

  1. Quick Application Checks

Check for error handling gaps

grep -rE "catch\s*(\s*)" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5

Empty catch blocks = silent failures

Output Format

Production Health Check

P0: Critical (Active Production Issues)

  • [SENTRY-123] PaymentIntent failed - 23 users affected (Score: 147) Location: api/checkout.ts:45 First seen: 2h ago

P1: High (Degraded Performance / Broken CI)

  • Health endpoint slow: /api/health responding in 2.3s (should be <500ms)
  • Vercel logs show 5xx errors in last hour (count: 12)
  • [CI] Main branch failing: "Build" workflow (run #1234) Failed step: "Type check" Error: Type 'string' is not assignable to type 'number'

P2: Medium (Warnings)

  • 3 empty catch blocks found (silent failures)
  • Health endpoint missing database connectivity check
  • [CI] 3 feature branch workflows failing (blocking PRs)

P3: Low (Improvements)

  • Consider adding Sentry performance monitoring
  • Health endpoint could include more service checks

Summary

  • P0: 1 | P1: 3 | P2: 3 | P3: 2
  • Recommendation: Fix P0 immediately, then fix main branch CI

Priority Mapping

Signal Priority

Active errors affecting users P0

5xx errors, slow responses P1

Main branch CI/CD failing P1

Feature branch CI blocking PRs P2

Silent failures, missing checks P2

Missing monitoring, improvements P3

Health Endpoint Anti-Pattern

Health checks that lie are worse than no health check. Example:

// ❌ BAD: Reports "ok" without checking return { status: "ok", services: { database: "ok" } };

// ✅ GOOD: Honest liveness probe (no fake service status) return { status: "ok", timestamp: new Date().toISOString() };

// ✅ BETTER: Real readiness probe const dbStatus = await checkDatabase() ? "ok" : "error"; return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };

If you can't verify a service, don't report on it. False "ok" status masks outages.

Analytics Note

This skill checks production health (errors, logs, endpoints), not product analytics.

For analytics auditing, see /check-observability . Note:

  • PostHog is REQUIRED for product analytics (has MCP server)

  • Vercel Analytics is NOT acceptable (no CLI/API/MCP - unusable for our workflow)

If you need to investigate user behavior or funnels during incident response, query PostHog via MCP.

  1. E2E Smoke Check

If Playwright is configured in the project:

Run smoke tests against production

PLAYWRIGHT_BASE_URL="$PROD_URL" npx playwright test e2e/smoke.spec.ts --reporter=list 2>&1 | head -30

Critical paths to verify:

  • Landing page loads (anonymous)

  • Dashboard loads (authenticated) — the #1 incident class

  • Subscribe page renders

  • Session page loads

  • No error boundaries triggered on any route

  1. Post-Deploy Health Check

Verify health endpoint

curl -sf "$PROD_URL/api/health" -w "\nHTTP %{http_code} in %{time_total}s\n" | head -5

Verify no error boundary on dashboard (check for error text in HTML)

curl -sf "$PROD_URL/dashboard" 2>/dev/null | grep -c "Something went wrong" && echo "ERROR BOUNDARY DETECTED" || echo "Dashboard OK"

Related

  • /log-production-issues

  • Create GitHub issues from findings

  • /triage

  • Fix production issues

  • /observability

  • Set up monitoring infrastructure

  • /flywheel-qa

  • Agentic QA for preview deployments

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

design-audit

No summary provided by upstream source.

Repository SourceNeeds Review
Security

changelog-audit

No summary provided by upstream source.

Repository SourceNeeds Review
Security

billing-security

No summary provided by upstream source.

Repository SourceNeeds Review