infrastructure-documenter

Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "infrastructure-documenter" with this command: npx skills add jmsktm/claude-settings/jmsktm-claude-settings-infrastructure-documenter

Infrastructure Documenter Skill

Overview

This skill helps you create clear, maintainable infrastructure documentation. Covers architecture diagrams, runbooks, system documentation, operational procedures, and documentation-as-code practices.

Documentation Philosophy

Principles

  1. Living documentation: Keep it in sync with reality
  2. Audience-aware: Different docs for different readers
  3. Actionable: Every doc should help someone do something
  4. Version-controlled: Documentation changes tracked with code

Document Types

TypeAudiencePurpose
ArchitectureEngineersUnderstand system design
RunbooksOps/SREHandle incidents
API DocsDevelopersIntegrate with system
OnboardingNew hiresGet up to speed
Decision RecordsFuture youUnderstand why

Architecture Documentation

System Architecture Overview

# System Architecture

## Overview

[Project Name] is a [type] application that [purpose].

## High-Level Architecture

┌─────────────────────────────────────────────────────────────┐ │ Users │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Vercel Edge │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Next.js App │ │ Edge Functions │ │ │ └─────────────────┘ └─────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Supabase │ │ Redis │ │ Stripe │ │ - PostgreSQL │ │ - Session │ │ - Payments │ │ - Auth │ │ - Cache │ │ - Webhooks │ │ - Realtime │ │ │ │ │ │ - Storage │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘


## Components

### Frontend (Next.js App)
- **Location**: Vercel Edge Network
- **Framework**: Next.js 14 (App Router)
- **Styling**: Tailwind CSS + shadcn/ui
- **State**: Zustand + React Query

### Backend Services
| Service | Provider | Purpose |
|---------|----------|---------|
| Database | Supabase | PostgreSQL with RLS |
| Auth | Supabase Auth | User authentication |
| Storage | Supabase Storage | File uploads |
| Cache | Upstash Redis | Session & API cache |
| Payments | Stripe | Subscriptions |
| Email | Resend | Transactional emails |

### Data Flow

1. User request → Vercel Edge
2. SSR/API Route processes request
3. Database queries via Supabase client
4. Response cached at edge (when applicable)
5. Response returned to user

## Security

### Authentication Flow
1. User signs in via Supabase Auth
2. JWT token issued and stored in cookie
3. Server validates token on each request
4. RLS policies enforce data access

### Data Protection
- All data encrypted at rest (AES-256)
- TLS 1.3 for data in transit
- Secrets stored in Vercel environment
- PII fields encrypted in database

Mermaid Diagrams

## Request Flow

```mermaid
sequenceDiagram
    participant U as User
    participant V as Vercel
    participant N as Next.js
    participant S as Supabase
    participant R as Redis

    U->>V: HTTPS Request
    V->>N: Route to App

    alt Cached Response
        N->>R: Check Cache
        R-->>N: Cache Hit
        N-->>U: Return Cached
    else Cache Miss
        N->>S: Query Database
        S-->>N: Data
        N->>R: Store in Cache
        N-->>U: Return Response
    end

Database Schema

erDiagram
    users ||--o{ projects : owns
    users {
        uuid id PK
        text email
        text name
        timestamp created_at
    }
    projects ||--o{ tasks : contains
    projects {
        uuid id PK
        uuid user_id FK
        text name
        text status
    }
    tasks {
        uuid id PK
        uuid project_id FK
        text title
        boolean completed
    }

## Runbooks

### Runbook Template

```markdown
# Runbook: [Service Name] - [Issue Type]

## Overview
Brief description of the issue and when this runbook applies.

## Severity
- **P1 (Critical)**: Complete outage
- **P2 (High)**: Degraded service
- **P3 (Medium)**: Minor impact
- **P4 (Low)**: No user impact

## Detection
How this issue is typically detected:
- [ ] Alert from [monitoring system]
- [ ] User report
- [ ] Automated check failure

## Impact Assessment
- **Users affected**: All / Segment / None
- **Data at risk**: Yes / No
- **Revenue impact**: High / Medium / Low / None

## Prerequisites
- [ ] Access to [system/dashboard]
- [ ] Credentials for [service]
- [ ] Contact info for [team/person]

## Resolution Steps

### Step 1: Verify the Issue
```bash
# Check service status
curl -I https://api.example.com/health

# Check logs
vercel logs --follow

Step 2: Identify Root Cause

Common causes:

  • Database connection pool exhausted
  • Memory limit reached
  • External service down
  • Bad deployment

Step 3: Apply Fix

If Database Issue:

# Check connection count
SELECT count(*) FROM pg_stat_activity;

# Kill idle connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND query_start < now() - interval '1 hour';

If Bad Deployment:

# Rollback to previous deployment
vercel rollback

Step 4: Verify Fix

# Check service health
curl https://api.example.com/health

# Monitor error rates for 15 minutes

Escalation

If unable to resolve within 30 minutes:

  1. Page on-call engineer: [contact]
  2. Notify stakeholders in #incidents
  3. Update status page

Post-Incident

  • Create incident report
  • Schedule post-mortem (P1/P2 only)
  • Update this runbook if needed

Related Links


### Database Runbooks

```markdown
# Runbook: Database Performance Issues

## Symptoms
- Slow API responses (>1s)
- Timeout errors in logs
- High database CPU in dashboard

## Quick Checks

### 1. Check Active Connections
```sql
SELECT
  state,
  count(*),
  max(now() - query_start) as max_duration
FROM pg_stat_activity
GROUP BY state;

2. Find Long-Running Queries

SELECT
  pid,
  now() - query_start AS duration,
  query
FROM pg_stat_activity
WHERE state = 'active'
  AND now() - query_start > interval '30 seconds'
ORDER BY duration DESC;

3. Check Table Sizes

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

4. Check Missing Indexes

SELECT
  relname,
  seq_scan,
  idx_scan,
  seq_scan - idx_scan AS difference
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY difference DESC;

Resolution

Kill Problematic Queries

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = [PID_FROM_ABOVE];

Add Missing Index

CREATE INDEX CONCURRENTLY idx_table_column
ON table_name (column_name);

## Decision Records (ADRs)

### ADR Template

```markdown
# ADR-001: Choose Supabase for Database

## Status
Accepted

## Context
We need a database solution for [Project Name] that supports:
- PostgreSQL compatibility
- Real-time subscriptions
- Built-in authentication
- Easy local development
- Generous free tier

## Decision
We will use Supabase as our primary database and auth provider.

## Alternatives Considered

### PlanetScale
**Pros:**
- Excellent scaling
- Branching for schema changes
- MySQL compatible

**Cons:**
- No built-in auth
- No real-time subscriptions
- Additional services needed

### Firebase
**Pros:**
- Real-time built-in
- Mature platform
- Good mobile SDKs

**Cons:**
- NoSQL (not ideal for our use case)
- Vendor lock-in concerns
- Complex security rules

## Consequences

### Positive
- Single provider for DB + Auth + Storage
- Great developer experience
- Row Level Security for data protection
- Local development with supabase CLI

### Negative
- PostgreSQL-specific features tie us to provider
- Supabase still maturing (some rough edges)
- Limited to their managed offering

### Risks
- Supabase scaling limitations at high traffic
- Migration cost if we need to move

## References
- [Supabase Documentation](https://supabase.com/docs)
- [Comparison: Supabase vs Firebase](https://...)

API Documentation

Endpoint Documentation

# API Reference

## Base URL

Production: https://api.example.com/v1 Staging: https://staging-api.example.com/v1


## Authentication

All API requests require authentication via Bearer token.

```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.example.com/v1/users

Endpoints

Users

Get Current User

GET /users/me

Response:

{
  "id": "usr_123",
  "email": "user@example.com",
  "name": "John Doe",
  "created_at": "2024-01-01T00:00:00Z"
}

Update User

PATCH /users/me

Request Body:

FieldTypeRequiredDescription
namestringNoDisplay name
avatar_urlstringNoProfile image URL

Example:

curl -X PATCH \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Jane Doe"}' \
  https://api.example.com/v1/users/me

Error Responses

StatusCodeDescription
400BAD_REQUESTInvalid request body
401UNAUTHORIZEDMissing or invalid token
403FORBIDDENInsufficient permissions
404NOT_FOUNDResource not found
429RATE_LIMITEDToo many requests
500INTERNAL_ERRORServer error

Error Response Format:

{
  "error": {
    "code": "NOT_FOUND",
    "message": "User not found"
  }
}

## Environment Documentation

### Environment Matrix

```markdown
# Environments

## Overview

| Environment | URL | Purpose | Deploy |
|-------------|-----|---------|--------|
| Production | https://myapp.com | Live users | Manual (main) |
| Staging | https://staging.myapp.com | Pre-release testing | Auto (main) |
| Preview | https://pr-*.vercel.app | PR review | Auto (PR) |
| Development | http://localhost:3000 | Local dev | Manual |

## Configuration

### Production
```env
NODE_ENV=production
DATABASE_URL=[Supabase Production]
NEXT_PUBLIC_APP_URL=https://myapp.com

Staging

NODE_ENV=production
DATABASE_URL=[Supabase Staging Branch]
NEXT_PUBLIC_APP_URL=https://staging.myapp.com

Development

NODE_ENV=development
DATABASE_URL=[Local Supabase]
NEXT_PUBLIC_APP_URL=http://localhost:3000

Access

Production

  • Vercel: Admin only
  • Database: Read-only for devs, write for admin
  • Logs: All engineers

Staging

  • Vercel: All engineers
  • Database: All engineers
  • Logs: All engineers

Secrets Rotation

SecretRotationLast Rotated
Database password90 days2024-01-15
API keys90 days2024-01-15
JWT secretNeverInitial setup

## Documentation-as-Code

### Documentation Structure

docs/ ├── README.md # Documentation index ├── architecture/ │ ├── overview.md # System architecture │ ├── data-flow.md # Data flow diagrams │ └── decisions/ # ADRs │ ├── 001-database.md │ └── 002-hosting.md ├── runbooks/ │ ├── README.md # Runbook index │ ├── database.md # Database issues │ ├── deployment.md # Deployment issues │ └── outage.md # Service outage ├── api/ │ └── reference.md # API documentation └── onboarding/ ├── setup.md # Local setup └── contributing.md # How to contribute


### Auto-Generated Documentation

```yaml
# .github/workflows/docs.yml
name: Generate Docs

on:
  push:
    branches: [main]
    paths:
      - 'src/**'
      - 'docs/**'

jobs:
  generate-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Generate API docs from OpenAPI
        run: |
          npx @redocly/cli build-docs openapi.yaml \
            --output docs/api/index.html

      - name: Generate TypeDoc
        run: npx typedoc --out docs/api/typescript

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./docs

Documentation Checklist

Architecture Docs

  • System overview diagram
  • Component descriptions
  • Data flow documentation
  • Security architecture
  • Technology decisions (ADRs)

Operational Docs

  • Runbooks for common issues
  • Deployment procedures
  • Monitoring and alerting
  • Incident response plan
  • On-call procedures

Developer Docs

  • Local setup guide
  • API reference
  • Contributing guidelines
  • Code conventions
  • Testing guide

Maintenance

  • Documentation review schedule
  • Ownership assigned
  • Change process defined
  • Versioning strategy

When to Use This Skill

Invoke this skill when:

  • Creating architecture documentation
  • Writing runbooks for operations
  • Documenting decision rationale (ADRs)
  • Setting up documentation structure
  • Creating onboarding materials
  • Building automated documentation
  • Planning incident response procedures

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

infrastructure-documenter

No summary provided by upstream source.

Repository SourceNeeds Review
General

business plan writer

No summary provided by upstream source.

Repository SourceNeeds Review
General

habit tracker

No summary provided by upstream source.

Repository SourceNeeds Review
General

investment analyzer

No summary provided by upstream source.

Repository SourceNeeds Review