Database Architect Skill

Step 1: Understand Data Requirements

Gather requirements:

Entities: What data needs to be stored?
Relationships: How do entities relate (1:1, 1:N, N:M)?
Access Patterns: How will data be queried?
Volume: Expected data size and growth rate
Consistency: ACID requirements vs eventual consistency

Step 2: Design Schema

For Relational Databases:

Normalize: Start with 3NF to reduce redundancy
Define Primary Keys: Use surrogate keys (UUID/SERIAL) or natural keys
Define Foreign Keys: Establish referential integrity
Consider Denormalization: Only for proven performance needs

For NoSQL Databases:

Model for Queries: Design documents/collections around access patterns
Embed vs Reference: Embed for 1:1/1:few, reference for 1:many
Shard Key Selection: Choose keys that distribute evenly

Step 3: Plan Indexes

Index strategy based on query patterns:

-- Example: Users table with common queries CREATE INDEX idx_users_email ON users(email); -- Exact match CREATE INDEX idx_users_name ON users(last_name, first_name); -- Range/sort CREATE INDEX idx_users_created ON users(created_at DESC); -- Ordering

Index Guidelines:

Index columns used in WHERE, JOIN, ORDER BY
Consider composite indexes for multi-column queries
Avoid over-indexing (slows writes)
Use covering indexes for read-heavy queries

Step 4: Plan Migrations

Create versioned migrations:

migrations/ 001_create_users.sql 002_add_email_index.sql 003_create_orders.sql

Migration Best Practices:

Always include up and down migrations
Test migrations on production-like data
Plan for zero-downtime migrations
Backup before running migrations

Step 5: Optimize Queries

Analyze and improve slow queries:

Use EXPLAIN ANALYZE: Understand execution plans
Identify Table Scans: Replace with index scans
Optimize JOINs: Ensure indexes on join columns
Batch Operations: Use bulk inserts/updates
Connection Pooling: Reduce connection overhead

Step 6: PostgreSQL 17 Features (2024–2026)

Leverage PostgreSQL 17 capabilities where applicable:

Performance improvements:

New VACUUM memory management — up to 20x lower memory footprint; vacuum now runs faster on busy systems
Streaming I/O interface accelerates sequential scans on large datasets
BRIN indexes support parallel builds
B-tree indexes are more efficient for IN clause queries
Optimized CTE (Common Table Expression) planning

SQL/JSON enhancements (PG 17):

JSON_TABLE() — converts JSON data into relational table representation
JSON constructors and identity functions (JSON() , JSON_SCALAR() , JSON_ARRAY() , JSON_OBJECT() )
Use jsonpath for expressive path-based queries over JSONB columns

Incremental backups:

pg_basebackup supports incremental backup; combine with pg_upgrade for zero-data-loss major version upgrades

Logical replication improvements:

Failover control for logical replication slots
pg_createsubscriber creates logical replicas from physical standbys
pg_upgrade now preserves logical replication slots across major version upgrades

Security:

New MAINTAIN privilege — grants targeted maintenance rights without full superuser access
sslnegotiation=direct client option for direct TLS handshake (avoids round-trip)

COPY improvements:

COPY ... ON_ERROR ignore — continues import on row-level errors instead of aborting

Step 7: pgvector for AI Embeddings

Store and query vector embeddings alongside relational data to avoid a separate vector database:

-- Install extension CREATE EXTENSION IF NOT EXISTS vector;

-- Table with embedding column (1536 dims for OpenAI text-embedding-3-small) CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content TEXT NOT NULL, embedding vector(1536), created_at TIMESTAMPTZ DEFAULT NOW() );

-- IVFFlat index for approximate nearest neighbor (ANN) search -- lists = sqrt(row_count) is a good starting value CREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- HNSW index (faster queries, more memory; preferred for < 1M vectors) CREATE INDEX idx_documents_embedding_hnsw ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

-- Similarity search (cosine distance) SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity FROM documents ORDER BY embedding <=> $1::vector LIMIT 10;

When to use pgvector vs. dedicated vector DB:

Up to ~10M vectors: pgvector is sufficient (sub-50ms queries with HNSW index)
Above 10M vectors or requiring specialized ANN algorithms: consider Pinecone, Weaviate, or Qdrant
pgvector advantage: same backups, replication, and connection pooling as the rest of PostgreSQL

Step 8: Table Partitioning Strategies

Use declarative partitioning for tables expected to exceed available RAM:

-- Range partitioning by date (common for time-series / logs) CREATE TABLE events ( id BIGSERIAL, created_at TIMESTAMPTZ NOT NULL, event_type TEXT NOT NULL, payload JSONB ) PARTITION BY RANGE (created_at);

-- Monthly partitions CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'); CREATE TABLE events_2025_02 PARTITION OF events FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Hash partitioning for even distribution (e.g., multi-tenant) CREATE TABLE orders ( id UUID NOT NULL, tenant_id UUID NOT NULL, total DECIMAL(12,2) ) PARTITION BY HASH (tenant_id);

CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 0); CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 1); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 2); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 3);

Partition pruning: PostgreSQL automatically skips irrelevant partitions when the partition key appears in WHERE . Always include the partition key in queries.

Index on partitioned tables: Indexes created on the parent table are automatically created on all child partitions.

Step 9: JSONB Patterns at Scale

-- Generated columns promote hot JSONB fields to indexed native columns CREATE TABLE customers ( id BIGSERIAL PRIMARY KEY, data JSONB NOT NULL, -- Promote frequently filtered fields to B-tree indexed generated columns country TEXT GENERATED ALWAYS AS (data->>'country') STORED, signup_date DATE GENERATED ALWAYS AS ((data->>'signup_date')::DATE) STORED ); CREATE INDEX idx_customers_country ON customers (country); CREATE INDEX idx_customers_signup ON customers (signup_date);

-- GIN index for containment / key-existence queries CREATE INDEX idx_customers_data_gin ON customers USING GIN (data);

-- Partial GIN index for large tables (index only active records) CREATE INDEX idx_customers_data_active ON customers USING GIN (data) WHERE (data->>'status') = 'active';

-- jsonpath query example (PG 17) SELECT * FROM customers WHERE data @? '$.tags[*] ? (@ == "premium")';

Step 10: Connection Pooling

Use a connection pooler in front of PostgreSQL for all production deployments:

PgBouncer (lightweight, battle-tested):

pgbouncer.ini

[databases] mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer] pool_mode = transaction ; transaction pooling for stateless apps max_client_conn = 1000 default_pool_size = 25 server_pool_size = 5

Supavisor (cloud-native, multi-tenant, Elixir-based):

Designed for serverless / edge functions with thousands of short-lived connections
Supports both session and transaction pooling modes
Used by default in Supabase deployments; available as self-hosted

Pooling modes:

Mode Use Case Notes

Session Long-running connections, LISTEN/NOTIFY

1 client = 1 server connection

Transaction Stateless APIs (recommended default) Most efficient; breaks SET / prepared statements

Statement Rarely needed Each statement can use a different server connection

</execution_process>

<best_practices>

Normalize First: Optimize later based on data
Index Thoughtfully: Based on actual query patterns
Use Migrations: Never modify schema directly
Monitor Performance: Use database profiling tools
Plan for Scale: Consider partitioning for large tables
Upgrade to PostgreSQL 17: Benefit from new VACUUM memory manager, SQL/JSON functions, and incremental backup support
Use pgvector for AI: Store embeddings in PostgreSQL with HNSW or IVFFlat indexes before reaching for a dedicated vector database
JSONB at Scale: Promote hot JSONB fields to generated columns with B-tree indexes; use GIN for containment queries
Always Pool Connections: Use PgBouncer (transaction mode) or Supavisor for all production PostgreSQL deployments
Partition Large Tables: Apply range (time-series), list (category), or hash (multi-tenant) partitioning for tables projected to exceed server RAM

</best_practices>

Design a schema for an e-commerce orders system

Example Response:

-- Users table CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT NOW() );

-- Orders table CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID REFERENCES users(id), status VARCHAR(20) DEFAULT 'pending', total_amount DECIMAL(10,2), created_at TIMESTAMP DEFAULT NOW() );

-- Order items (1:N relationship) CREATE TABLE order_items ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), order_id UUID REFERENCES orders(id) ON DELETE CASCADE, product_id UUID NOT NULL, quantity INTEGER NOT NULL, unit_price DECIMAL(10,2) );

-- Indexes for common queries CREATE INDEX idx_orders_user ON orders(user_id); CREATE INDEX idx_orders_status ON orders(status); CREATE INDEX idx_order_items_order ON order_items(order_id);

</usage_example>

Rules

Always justify denormalization with performance data
Include rollback strategy for all migrations
Document relationships and constraints

Related Workflow

This skill has a corresponding workflow for complex multi-agent scenarios:

Workflow: .claude/workflows/database-architect-skill-workflow.md
When to use workflow: For comprehensive database design including requirements analysis, schema design, query optimization, migration planning, and testing (multi-phase, multi-agent)
When to use skill directly: For quick schema reviews or single-agent database tasks

Iron Laws

NEVER make schema changes without versioned migrations that include both UP and DOWN scripts — manual DDL in production is not recoverable.
ALWAYS normalize to at least 3NF before considering denormalization — never prematurely optimize without measured performance evidence.
ALWAYS plan indexes based on actual query patterns from EXPLAIN ANALYZE — never add indexes speculatively before profiling real workloads.
NEVER test or deploy a migration without running it against production-like data first — schema issues surface under realistic volume, not on empty tables.
ALWAYS use connection pooling (Supavisor or PgBouncer) in production — direct connections from serverless functions exhaust the database connection limit under load.

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach

Manual DDL directly on production No rollback path; breaks migration history Always use versioned migrations with DOWN scripts

Premature denormalization Adds complexity before profiling; often no measurable gain Normalize first, denormalize only after EXPLAIN ANALYZE reveals bottleneck

Indexing every column Slows writes; wastes storage; misleads query planner Index only columns that appear in WHERE, JOIN, and ORDER BY clauses

Adding NOT NULL column without default Locks entire table during migration on large datasets Add nullable column, backfill in batches, then add NOT NULL constraint

Direct connections from serverless functions Connection limit exhausted under load spike Use PgBouncer or Supavisor in transaction-pooling mode

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

database-architect

Safety Notice

Copy this and send it to your AI assistant to learn

pgbouncer.ini

Source Transparency

Related Skills

filesystem

slack-notifications

chrome-browser