Database Schema Design
When to use this skill
Lists specific situations where this skill should be triggered:
-
New Project: Database schema design for a new application
-
Schema Refactoring: Redesigning an existing schema for performance or scalability
-
Relationship Definition: Implementing 1:1, 1:N, N:M relationships between tables
-
Migration: Safely applying schema changes
-
Performance Issues: Index and schema optimization to resolve slow queries
Input Format
The required and optional input information to collect from the user:
Required Information
-
Database Type: PostgreSQL, MySQL, MongoDB, SQLite, etc.
-
Domain Description: What data will be stored (e.g., e-commerce, blog, social media)
-
Key Entities: Core data objects (e.g., User, Product, Order)
Optional Information
-
Expected Data Volume: Small (<10K rows), Medium (10K-1M), Large (>1M) (default: Medium)
-
Read/Write Ratio: Read-heavy, Write-heavy, Balanced (default: Balanced)
-
Transaction Requirements: Whether ACID is required (default: true)
-
Sharding/Partitioning: Whether large data distribution is needed (default: false)
Input Example
Design a database for an e-commerce platform:
- DB: PostgreSQL
- Entities: User, Product, Order, Review
- Relationships:
- A User can have multiple Orders
- An Order contains multiple Products (N:M)
- A Review is linked to a User and a Product
- Expected data: 100,000 users, 10,000 products
- Read-heavy (frequent product lookups)
Instructions
Specifies the step-by-step task sequence to follow precisely.
Step 1: Define Entities and Attributes
Identify core data objects and their attributes.
Tasks:
-
Extract nouns from business requirements → entities
-
List each entity's attributes (columns)
-
Determine data types (VARCHAR, INTEGER, TIMESTAMP, JSON, etc.)
-
Designate Primary Keys (UUID vs Auto-increment ID)
Example (E-commerce):
Users
- id: UUID PRIMARY KEY
- email: VARCHAR(255) UNIQUE NOT NULL
- username: VARCHAR(50) UNIQUE NOT NULL
- password_hash: VARCHAR(255) NOT NULL
- created_at: TIMESTAMP DEFAULT NOW()
- updated_at: TIMESTAMP DEFAULT NOW()
Products
- id: UUID PRIMARY KEY
- name: VARCHAR(255) NOT NULL
- description: TEXT
- price: DECIMAL(10, 2) NOT NULL
- stock: INTEGER DEFAULT 0
- category_id: UUID REFERENCES Categories(id)
- created_at: TIMESTAMP DEFAULT NOW()
Orders
- id: UUID PRIMARY KEY
- user_id: UUID REFERENCES Users(id)
- total_amount: DECIMAL(10, 2) NOT NULL
- status: VARCHAR(20) DEFAULT 'pending'
- created_at: TIMESTAMP DEFAULT NOW()
OrderItems (Junction table)
- id: UUID PRIMARY KEY
- order_id: UUID REFERENCES Orders(id) ON DELETE CASCADE
- product_id: UUID REFERENCES Products(id)
- quantity: INTEGER NOT NULL
- price: DECIMAL(10, 2) NOT NULL
Step 2: Design Relationships and Normalization
Define relationships between tables and apply normalization.
Tasks:
-
1:1 relationship: Foreign Key + UNIQUE constraint
-
1:N relationship: Foreign Key
-
N:M relationship: Create junction table
-
Determine normalization level (1NF ~ 3NF)
Decision Criteria:
-
OLTP systems → normalize to 3NF (data integrity)
-
OLAP/analytics systems → denormalization allowed (query performance)
-
Read-heavy → minimize JOINs with partial denormalization
-
Write-heavy → full normalization to eliminate redundancy
Example (ERD Mermaid):
erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in" Categories ||--o{ Products : categorizes Users ||--o{ Reviews : writes Products ||--o{ Reviews : "reviewed by"
Users {
uuid id PK
string email UK
string username UK
string password_hash
timestamp created_at
}
Products {
uuid id PK
string name
decimal price
int stock
uuid category_id FK
}
Orders {
uuid id PK
uuid user_id FK
decimal total_amount
string status
timestamp created_at
}
OrderItems {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal price
}
Step 3: Establish Indexing Strategy
Design indexes for query performance.
Tasks:
-
Primary Keys automatically create indexes
-
Columns frequently used in WHERE clauses → add indexes
-
Foreign Keys used in JOINs → indexes
-
Consider composite indexes (WHERE col1 = ? AND col2 = ?)
-
UNIQUE indexes (email, username, etc.)
Checklist:
-
Indexes on frequently queried columns
-
Indexes on Foreign Key columns
-
Composite index order optimized (high selectivity columns first)
-
Avoid excessive indexes (degrades INSERT/UPDATE performance)
Example (PostgreSQL):
-- Primary Keys (auto-indexed) CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, -- UNIQUE = auto-indexed username VARCHAR(50) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
-- Foreign Keys + explicit indexes CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, total_amount DECIMAL(10, 2) NOT NULL, status VARCHAR(20) DEFAULT 'pending', created_at TIMESTAMP DEFAULT NOW() );
CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_orders_status ON orders(status); CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Composite index (status and created_at frequently queried together) CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Products table CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), category_id UUID REFERENCES categories(id), created_at TIMESTAMP DEFAULT NOW() );
CREATE INDEX idx_products_category ON products(category_id); CREATE INDEX idx_products_price ON products(price); -- price range search CREATE INDEX idx_products_name ON products(name); -- product name search
-- Full-text search (PostgreSQL) CREATE INDEX idx_products_name_fts ON products USING GIN(to_tsvector('english', name)); CREATE INDEX idx_products_description_fts ON products USING GIN(to_tsvector('english', description));
Step 4: Set Up Constraints and Triggers
Add constraints to ensure data integrity.
Tasks:
-
NOT NULL: required columns
-
UNIQUE: columns that must be unique
-
CHECK: value range constraints (e.g., price >= 0)
-
Foreign Key + CASCADE option
-
Set default values
Example:
CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), discount_percent INTEGER CHECK (discount_percent >= 0 AND discount_percent <= 100), category_id UUID REFERENCES categories(id) ON DELETE SET NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
-- Trigger: auto-update updated_at CREATE OR REPLACE FUNCTION update_updated_at_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER update_products_updated_at BEFORE UPDATE ON products FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
Step 5: Write Migration Scripts
Write migrations that safely apply schema changes.
Tasks:
-
UP migration: apply changes
-
DOWN migration: rollback
-
Wrap in transactions
-
Prevent data loss (use ALTER TABLE carefully)
Example (SQL migration):
-- migrations/001_create_initial_schema.up.sql BEGIN;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
CREATE TABLE categories ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(100) UNIQUE NOT NULL, parent_id UUID REFERENCES categories(id) );
CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), category_id UUID REFERENCES categories(id), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
CREATE INDEX idx_products_category ON products(category_id); CREATE INDEX idx_products_price ON products(price);
COMMIT;
-- migrations/001_create_initial_schema.down.sql BEGIN;
DROP TABLE IF EXISTS products CASCADE; DROP TABLE IF EXISTS categories CASCADE; DROP TABLE IF EXISTS users CASCADE;
COMMIT;
Output format
Defines the exact format that deliverables should follow.
Basic Structure
project/ ├── database/ │ ├── schema.sql # full schema │ ├── migrations/ │ │ ├── 001_create_users.up.sql │ │ ├── 001_create_users.down.sql │ │ ├── 002_create_products.up.sql │ │ └── 002_create_products.down.sql │ ├── seeds/ │ │ └── sample_data.sql # test data │ └── docs/ │ ├── ERD.md # Mermaid ERD diagram │ └── SCHEMA.md # schema documentation └── README.md
ERD Diagram (Mermaid Format)
Database Schema
Entity Relationship Diagram
```mermaid erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in"
Users {
uuid id PK
string email UK
string username UK
}
Products {
uuid id PK
string name
decimal price
}
```
Table Descriptions
users
- Purpose: Store user account information
- Indexes: email, username
- Estimated rows: 100,000
products
- Purpose: Product catalog
- Indexes: category_id, price, name
- Estimated rows: 10,000
Constraints
Specifies mandatory rules and prohibited actions.
Mandatory Rules (MUST)
Primary Key Required: Define a Primary Key on every table
-
Unique record identification
-
Ensures referential integrity
Explicit Foreign Keys: Tables with relationships must define Foreign Keys
-
Specify ON DELETE CASCADE/SET NULL options
-
Prevent orphan records
Use NOT NULL Appropriately: Required columns must be NOT NULL
-
Clearly specify nullable vs. non-nullable
-
Providing defaults is recommended
Prohibited Actions (MUST NOT)
Avoid EAV Pattern Abuse: Use the Entity-Attribute-Value pattern only in special cases
-
Query complexity increases dramatically
-
Performance degradation
Excessive Denormalization: Be careful when denormalizing for performance
-
Data consistency issues
-
Risk of update anomalies
No Plaintext Storage of Sensitive Data: Never store passwords, card numbers, etc. in plaintext
-
Hashing/encryption is mandatory
-
Legal liability issues
Security Rules
-
Principle of Least Privilege: Grant only the necessary permissions to application DB accounts
-
SQL Injection Prevention: Use Prepared Statements / Parameterized Queries
-
Encrypt Sensitive Columns: Consider encrypting personally identifiable information at rest
Examples
Demonstrates how to apply the skill through real-world use cases.
Example 1: Blog Platform Schema
Situation: Database design for a Medium-style blog platform
User Request:
Design a PostgreSQL schema for a blog platform:
- Users can write multiple posts
- Posts can have multiple tags (N:M)
- Users can like and bookmark posts
- Comment feature (with nested replies)
Final Result:
-- Users CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, bio TEXT, avatar_url VARCHAR(500), created_at TIMESTAMP DEFAULT NOW() );
-- Posts CREATE TABLE posts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, title VARCHAR(255) NOT NULL, slug VARCHAR(255) UNIQUE NOT NULL, content TEXT NOT NULL, published_at TIMESTAMP, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
CREATE INDEX idx_posts_author ON posts(author_id); CREATE INDEX idx_posts_published ON posts(published_at); CREATE INDEX idx_posts_slug ON posts(slug);
-- Tags CREATE TABLE tags ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(50) UNIQUE NOT NULL, slug VARCHAR(50) UNIQUE NOT NULL );
-- Post-Tag relationship (N:M) CREATE TABLE post_tags ( post_id UUID REFERENCES posts(id) ON DELETE CASCADE, tag_id UUID REFERENCES tags(id) ON DELETE CASCADE, PRIMARY KEY (post_id, tag_id) );
CREATE INDEX idx_post_tags_post ON post_tags(post_id); CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);
-- Likes CREATE TABLE post_likes ( user_id UUID REFERENCES users(id) ON DELETE CASCADE, post_id UUID REFERENCES posts(id) ON DELETE CASCADE, created_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (user_id, post_id) );
-- Bookmarks CREATE TABLE post_bookmarks ( user_id UUID REFERENCES users(id) ON DELETE CASCADE, post_id UUID REFERENCES posts(id) ON DELETE CASCADE, created_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (user_id, post_id) );
-- Comments (self-referencing for nested comments) CREATE TABLE comments ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE, author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, parent_comment_id UUID REFERENCES comments(id) ON DELETE CASCADE, content TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );
CREATE INDEX idx_comments_post ON comments(post_id); CREATE INDEX idx_comments_author ON comments(author_id); CREATE INDEX idx_comments_parent ON comments(parent_comment_id);
Example 2: MongoDB Schema (NoSQL)
Situation: MongoDB schema for a real-time chat app
User Request:
Design a MongoDB schema for a real-time chat app. Reads are very frequent, and message history needs to be retrieved quickly.
Final Result:
// users collection { _id: ObjectId, username: String, // indexed, unique email: String, // indexed, unique avatar_url: String, status: String, // 'online', 'offline', 'away' last_seen: Date, created_at: Date }
// conversations collection (denormalized - read-optimized) { _id: ObjectId, participants: [ // indexed { user_id: ObjectId, username: String, avatar_url: String } ], last_message: { // denormalized for fast recent-message retrieval content: String, sender_id: ObjectId, sent_at: Date }, unread_counts: { // unread message count per participant "user_id_1": 5, "user_id_2": 0 }, created_at: Date, updated_at: Date }
// messages collection { _id: ObjectId, conversation_id: ObjectId, // indexed sender_id: ObjectId, content: String, attachments: [ { type: String, // 'image', 'file', 'video' url: String, filename: String } ], read_by: [ObjectId], // array of user IDs who have read the message sent_at: Date, // indexed edited_at: Date }
// Indexes db.users.createIndex({ username: 1 }, { unique: true }); db.users.createIndex({ email: 1 }, { unique: true });
db.conversations.createIndex({ "participants.user_id": 1 }); db.conversations.createIndex({ updated_at: -1 });
db.messages.createIndex({ conversation_id: 1, sent_at: -1 }); db.messages.createIndex({ sender_id: 1 });
Design Highlights:
-
Denormalization for read optimization (embedding last_message)
-
Indexes on frequently accessed fields
-
Using array fields (participants, read_by)
Best practices
Quality Improvement
Naming Convention Consistency: Use snake_case for table/column names
-
users, post_tags, created_at
-
Be consistent with plurals/singulars (tables plural, columns singular, etc.)
Consider Soft Delete: Use logical deletion instead of physical deletion for important data
-
deleted_at TIMESTAMP (NULL = active, NOT NULL = deleted)
-
Allows recovery of accidentally deleted data
-
Audit trail
Timestamps Required: Include created_at and updated_at in most tables
-
Data tracking and debugging
-
Time-series analysis
Efficiency Improvements
-
Partial Indexes: Minimize index size with conditional indexes CREATE INDEX idx_posts_published ON posts(published_at) WHERE published_at IS NOT NULL;
-
Materialized Views: Cache complex aggregate queries as Materialized Views
-
Partitioning: Partition large tables by date/range
Common Issues
Issue 1: N+1 Query Problem
Symptom: Multiple DB calls when a single query would suffice
Cause: Individual lookups in a loop without JOINs
Solution:
-- ❌ Bad example: N+1 queries SELECT * FROM posts; -- 1 time -- for each post SELECT * FROM users WHERE id = ?; -- N times
-- ✅ Good example: 1 query SELECT posts.*, users.username, users.avatar_url FROM posts JOIN users ON posts.author_id = users.id;
Issue 2: Slow JOINs Due to Unindexed Foreign Keys
Symptom: JOIN queries are very slow
Cause: Missing index on Foreign Key column
Solution:
CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id); CREATE INDEX idx_order_items_product_id ON order_items(product_id);
Issue 3: UUID vs Auto-increment Performance
Symptom: Insert performance degradation when using UUID Primary Keys
Cause: UUIDs are random, causing index fragmentation
Solution:
-
PostgreSQL: Use uuid_generate_v7() (time-ordered UUID)
-
MySQL: Use UUID_TO_BIN(UUID(), 1)
-
Or consider using Auto-increment BIGINT
References
Official Documentation
-
PostgreSQL Documentation
-
MySQL Documentation
-
MongoDB Schema Design Best Practices
Tools
-
dbdiagram.io - ERD diagram creation
-
PgModeler - PostgreSQL modeling tool
-
Prisma - ORM + migrations
Learning Resources
-
Database Design Course (freecodecamp)
-
Use The Index, Luke - SQL indexing guide
Metadata
Version
-
Current Version: 1.0.0
-
Last Updated: 2025-01-01
-
Compatible Platforms: Claude, ChatGPT, Gemini
Related Skills
-
api-design: Schema design alongside API design
-
performance-optimization: Query performance optimization
Tags
#database #schema #PostgreSQL #MySQL #MongoDB #SQL #NoSQL #migration #ERD