database-management-patterns

Database Management Patterns

A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.

When to Use This Skill

Use this skill when:

Designing database schemas for new applications or refactoring existing ones
Choosing between SQL and NoSQL databases for your use case
Optimizing query performance with proper indexing strategies
Implementing data consistency with transactions and ACID guarantees
Scaling databases horizontally with sharding and replication
Managing high-traffic applications requiring distributed databases
Ensuring data integrity with constraints, triggers, and validation
Troubleshooting performance issues using explain plans and query analysis
Building fault-tolerant systems with replication and failover strategies
Working with complex data relationships (relational) or flexible schemas (document)

Core Concepts

Database Paradigms Comparison

Relational Databases (PostgreSQL)

Strengths:

ACID Transactions: Strong consistency guarantees
Complex Queries: JOIN operations, subqueries, CTEs
Data Integrity: Foreign keys, constraints, triggers
Normalized Data: Reduced redundancy, consistent updates
Mature Ecosystem: Rich tooling, extensions, community

Best For:

Financial systems requiring strict consistency
Complex relationships and data integrity requirements
Applications with structured, well-defined schemas
Systems requiring complex analytical queries
Multi-step transactions across multiple tables

Document Databases (MongoDB)

Strengths:

Flexible Schema: Easy schema evolution, polymorphic data
Horizontal Scalability: Built-in sharding support
JSON-Native: Natural fit for modern application development
Embedded Documents: Denormalized data for performance
Aggregation Framework: Powerful data processing pipeline

Best For:

Rapidly evolving applications with changing requirements
Content management systems with varied data structures
Real-time analytics and event logging
Mobile and web applications with JSON APIs
Hierarchical or nested data structures

ACID Properties

Atomicity: All operations in a transaction succeed or fail together Consistency: Transactions bring database from one valid state to another Isolation: Concurrent transactions don't interfere with each other Durability: Committed transactions survive system failures

CAP Theorem

In distributed systems, choose two of three:

Consistency: All nodes see the same data
Availability: System remains operational
Partition Tolerance: System continues despite network failures

PostgreSQL emphasizes CP (Consistency + Partition Tolerance) MongoDB can be configured for CP or AP depending on write/read concerns

PostgreSQL Patterns

Schema Design Fundamentals

Normalization Levels

First Normal Form (1NF)

Atomic values (no arrays or lists in columns)
Each row is unique (primary key exists)
No repeating groups

Second Normal Form (2NF)

Meets 1NF requirements
All non-key attributes depend on the entire primary key

Third Normal Form (3NF)

Meets 2NF requirements
No transitive dependencies (non-key attributes depend only on primary key)

When to Denormalize:

Read-heavy workloads where joins are expensive
Frequently accessed aggregate data
Historical snapshots that shouldn't change
Performance-critical queries

Table Design Patterns

Primary Keys:

-- Serial auto-increment (traditional) CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

-- UUID for distributed systems CREATE TABLE accounts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

-- Composite primary key CREATE TABLE order_items ( order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL, price NUMERIC(10, 2) NOT NULL, PRIMARY KEY (order_id, product_id), FOREIGN KEY (order_id) REFERENCES orders(id), FOREIGN KEY (product_id) REFERENCES products(id) );

Foreign Key Constraints:

-- Cascade delete: Remove child records when parent deleted CREATE TABLE posts ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL, title VARCHAR(255) NOT NULL, content TEXT, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE );

-- Set null: Preserve child records, nullify reference CREATE TABLE comments ( id SERIAL PRIMARY KEY, post_id INTEGER, user_id INTEGER, content TEXT NOT NULL, FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL );

-- Restrict: Prevent deletion if child records exist CREATE TABLE categories ( id SERIAL PRIMARY KEY, name VARCHAR(255) UNIQUE NOT NULL );

CREATE TABLE products ( id SERIAL PRIMARY KEY, category_id INTEGER NOT NULL, name VARCHAR(255) NOT NULL, FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT );

Advanced Constraints

Check Constraints:

CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price NUMERIC(10, 2) NOT NULL CHECK (price > 0), discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100), stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0) );

-- Table-level check constraint CREATE TABLE date_ranges ( id SERIAL PRIMARY KEY, start_date DATE NOT NULL, end_date DATE NOT NULL, CHECK (end_date > start_date) );

Unique Constraints:

-- Single column unique CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL );

-- Composite unique constraint CREATE TABLE user_permissions ( user_id INTEGER NOT NULL, permission_id INTEGER NOT NULL, granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE (user_id, permission_id), FOREIGN KEY (user_id) REFERENCES users(id), FOREIGN KEY (permission_id) REFERENCES permissions(id) );

-- Partial unique index (unique where condition met) CREATE UNIQUE INDEX unique_active_email ON users (email) WHERE active = true;

Triggers and Functions

Audit Trail Pattern:

-- Audit table CREATE TABLE audit_log ( id SERIAL PRIMARY KEY, table_name VARCHAR(255) NOT NULL, record_id INTEGER NOT NULL, action VARCHAR(10) NOT NULL, old_data JSONB, new_data JSONB, changed_by VARCHAR(255), changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

-- Trigger function CREATE OR REPLACE FUNCTION audit_trigger_function() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'INSERT' THEN INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by) VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user); RETURN NEW; ELSIF TG_OP = 'UPDATE' THEN INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by) VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user); RETURN NEW; ELSIF TG_OP = 'DELETE' THEN INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by) VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user); RETURN OLD; END IF; END; $$ LANGUAGE plpgsql;

-- Attach trigger to table CREATE TRIGGER users_audit_trigger AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

Timestamp Update Pattern:

CREATE OR REPLACE FUNCTION update_modified_timestamp() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = CURRENT_TIMESTAMP; RETURN NEW; END; $$ LANGUAGE plpgsql;

CREATE TABLE posts ( id SERIAL PRIMARY KEY, title VARCHAR(255) NOT NULL, content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

CREATE TRIGGER posts_update_timestamp BEFORE UPDATE ON posts FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();

Views and Materialized Views

Standard Views:

-- Virtual table - computed on each query CREATE VIEW active_users_with_posts AS SELECT u.id, u.username, u.email, COUNT(p.id) as post_count, MAX(p.created_at) as last_post_date FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.active = true GROUP BY u.id, u.username, u.email;

-- Use view like a table SELECT * FROM active_users_with_posts WHERE post_count > 10;

Materialized Views:

-- Physical table - stores computed results CREATE MATERIALIZED VIEW user_statistics AS SELECT u.id, u.username, COUNT(DISTINCT p.id) as total_posts, COUNT(DISTINCT c.id) as total_comments, AVG(p.views) as avg_post_views, MAX(p.created_at) as last_activity FROM users u LEFT JOIN posts p ON u.id = p.user_id LEFT JOIN comments c ON u.id = c.user_id GROUP BY u.id, u.username;

-- Create index on materialized view CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- Refresh materialized view (update data) REFRESH MATERIALIZED VIEW user_statistics;

-- Concurrent refresh (allows reads during refresh) REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;

MongoDB Patterns

Document Modeling Strategies

Embedding vs Referencing

Embedding Pattern (Denormalization):

// One-to-few: Embed when relationship is contained // Example: Blog post with comments { _id: ObjectId("..."), title: "Database Design Patterns", author: "John Doe", content: "...", published_at: ISODate("2025-01-15"), comments: [ { _id: ObjectId("..."), author: "Jane Smith", text: "Great article!", created_at: ISODate("2025-01-16") }, { _id: ObjectId("..."), author: "Bob Johnson", text: "Very helpful, thanks!", created_at: ISODate("2025-01-17") } ], tags: ["database", "design", "patterns"], stats: { views: 1523, likes: 89, shares: 23 } }

// Benefits: // - Single query to retrieve post with comments // - Better read performance // - Atomic updates to entire document // // Drawbacks: // - Document size limits (16MB in MongoDB) // - Difficult to query comments independently // - May duplicate data if comments need to appear elsewhere

Referencing Pattern (Normalization):

// One-to-many or many-to-many: Reference when relationship is unbounded // Example: User with many posts

// Users collection { _id: ObjectId("507f1f77bcf86cd799439011"), username: "john_doe", email: "john@example.com", profile: { bio: "Software engineer", avatar_url: "https://...", location: "San Francisco" }, created_at: ISODate("2024-01-01") }

// Posts collection (references user) { _id: ObjectId("507f191e810c19729de860ea"), user_id: ObjectId("507f1f77bcf86cd799439011"), title: "My First Post", content: "...", published_at: ISODate("2025-01-15"), comment_ids: [ ObjectId("..."), ObjectId("...") ] }

// Benefits: // - No duplication of user data // - Flexible: users can have unlimited posts // - Easy to update user information once // // Drawbacks: // - Requires multiple queries or $lookup // - Slower read performance for joined data

Hybrid Approach (Selective Denormalization):

// Store frequently accessed fields from referenced document { _id: ObjectId("..."), title: "Database Patterns", content: "...", author: { // Embedded: frequently accessed, rarely changes id: ObjectId("507f1f77bcf86cd799439011"), username: "john_doe", avatar_url: "https://..." }, // Reference: full user data available if needed author_id: ObjectId("507f1f77bcf86cd799439011"), published_at: ISODate("2025-01-15") }

// Benefits: // - Fast reads with embedded frequently-used data // - Can still get full user data when needed // - Balance between performance and flexibility // // Tradeoffs: // - Need to update embedded data when user changes username/avatar // - Slightly larger documents

Schema Design Patterns

Bucket Pattern (Time-Series Data):

// Instead of one document per measurement: // BAD: Millions of tiny documents { sensor_id: "sensor_001", timestamp: ISODate("2025-01-15T10:00:00Z"), temperature: 72.5, humidity: 45 }

// GOOD: Bucket documents with arrays of measurements { sensor_id: "sensor_001", date: ISODate("2025-01-15"), hour: 10, measurements: [ { minute: 0, temperature: 72.5, humidity: 45 }, { minute: 1, temperature: 72.6, humidity: 45 }, { minute: 2, temperature: 72.4, humidity: 46 }, // ... up to 60 measurements per hour ], summary: { count: 60, avg_temperature: 72.5, min_temperature: 71.8, max_temperature: 73.2 } }

// Benefits: // - Reduced document count (60x fewer documents) // - Better index efficiency // - Pre-computed summaries // - Easier to query by time ranges

Computed Pattern (Pre-Aggregated Data):

// Store computed values to avoid expensive aggregations { _id: ObjectId("..."), product_id: "PROD-123", month: "2025-01", total_sales: 15420.50, units_sold: 234, unique_customers: 187, avg_order_value: 65.90, top_customers: [ { customer_id: "CUST-456", revenue: 890.50 }, { customer_id: "CUST-789", revenue: 675.25 } ], computed_at: ISODate("2025-02-01T00:00:00Z") }

// Update pattern: Scheduled job or trigger updates computed values

Polymorphic Pattern (Varied Schemas):

// Handle different product types in single collection { _id: ObjectId("..."), type: "book", name: "Database Design", price: 49.99, // Book-specific fields isbn: "978-0-123456-78-9", author: "John Smith", pages: 456, publisher: "Tech Books Inc" }

{ _id: ObjectId("..."), type: "electronics", name: "Wireless Mouse", price: 29.99, // Electronics-specific fields brand: "TechBrand", warranty_months: 24, specifications: { battery_life: "6 months", connectivity: "Bluetooth 5.0" } }

// Query by type db.products.find({ type: "book", author: "John Smith" }) db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })

Aggregation Framework

Basic Aggregation Pipeline:

// Group by author and count posts db.posts.aggregate([ { $match: { published: true } // Filter stage }, { $group: { _id: "$author_id", total_posts: { $sum: 1 }, total_views: { $sum: "$views" }, avg_views: { $avg: "$views" }, latest_post: { $max: "$published_at" } } }, { $sort: { total_posts: -1 } // Sort by post count }, { $limit: 10 // Top 10 authors } ])

Advanced Pipeline with Lookup (Join):

// Join posts with user data db.posts.aggregate([ { $match: { published_at: { $gte: ISODate("2025-01-01") } } }, { $lookup: { from: "users", localField: "author_id", foreignField: "_id", as: "author" } }, { $unwind: "$author" // Flatten author array }, { $project: { title: 1, content: 1, views: 1, "author.username": 1, "author.email": 1, days_since_publish: { $divide: [ { $subtract: [new Date(), "$published_at"] }, 1000 * 60 * 60 * 24 ] } } }, { $sort: { views: -1 } } ])

Aggregation with Grouping and Reshaping:

// Complex aggregation: Sales analysis db.orders.aggregate([ { $match: { status: "completed", created_at: { $gte: ISODate("2025-01-01"), $lt: ISODate("2025-02-01") } } }, { $unwind: "$items" // Flatten order items }, { $group: { _id: { product_id: "$items.product_id", customer_region: "$customer.region" }, total_quantity: { $sum: "$items.quantity" }, total_revenue: { $sum: "$items.total_price" }, order_count: { $sum: 1 }, avg_order_value: { $avg: "$items.total_price" } } }, { $group: { _id: "$_id.product_id", regions: { $push: { region: "$_id.customer_region", quantity: "$total_quantity", revenue: "$total_revenue" } }, total_quantity: { $sum: "$total_quantity" }, total_revenue: { $sum: "$total_revenue" } } }, { $sort: { total_revenue: -1 } } ])

Indexing Strategies

PostgreSQL Indexes

B-tree Indexes (Default):

-- Single column index CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!) CREATE INDEX idx_posts_author_published ON posts(author_id, published_at);

-- Query can use index: -- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at; -- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- Query CANNOT fully use index: -- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)

Partial Indexes:

-- Index only active users CREATE INDEX idx_active_users ON users(username) WHERE active = true;

-- Index only recent orders CREATE INDEX idx_recent_orders ON orders(created_at, status) WHERE created_at > '2024-01-01';

-- Benefits: Smaller index size, faster queries on filtered data

Expression Indexes:

-- Index on lowercase email for case-insensitive search CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Query that uses this index: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index on JSONB field extraction CREATE INDEX idx_metadata_tags ON products((metadata->>'category'));

Full-Text Search Indexes:

-- Add tsvector column for full-text search ALTER TABLE articles ADD COLUMN tsv_content tsvector;

-- Populate tsvector column UPDATE articles SET tsv_content = to_tsvector('english', title || ' ' || content);

-- Create GIN index for full-text search CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- Full-text search query SELECT title, ts_rank(tsv_content, query) as rank FROM articles, to_tsquery('english', 'database & design') query WHERE tsv_content @@ query ORDER BY rank DESC;

-- Trigger to auto-update tsvector CREATE TRIGGER articles_tsv_update BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);

JSONB Indexes:

-- GIN index for JSONB containment queries CREATE INDEX idx_products_metadata ON products USING GIN(metadata);

-- Queries that use this index: SELECT * FROM products WHERE metadata @> '{"color": "blue"}'; SELECT * FROM products WHERE metadata ? 'size';

-- Index on specific JSONB path CREATE INDEX idx_products_category ON products((metadata->>'category'));

Index Monitoring:

-- Find unused indexes SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index usage SELECT relname as table_name, indexrelname as index_name, idx_scan as times_used, idx_tup_read as tuples_read, idx_tup_fetch as tuples_fetched FROM pg_stat_user_indexes ORDER BY idx_scan ASC;

MongoDB Indexes

Single Field Indexes:

// Create index on single field db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending

// Unique index db.users.createIndex({ username: 1 }, { unique: true })

// Sparse index (only index documents with the field) db.users.createIndex({ phone_number: 1 }, { sparse: true })

Compound Indexes:

// Index on multiple fields (order matters!) db.posts.createIndex({ author_id: 1, published_at: -1 })

// Efficient queries: // - { author_id: "123" } // - { author_id: "123", published_at: { $gte: ... } } // - { author_id: "123" } with sort by published_at

// Inefficient: // - { published_at: { $gte: ... } } alone (doesn't use index efficiently)

// ESR Rule: Equality, Sort, Range // Best compound index order: // 1. Equality filters first // 2. Sort fields second // 3. Range filters last db.orders.createIndex({ status: 1, // Equality created_at: -1, // Sort total_amount: 1 // Range })

Multikey Indexes (Array Fields):

// Index on array field db.posts.createIndex({ tags: 1 })

// Document with array { _id: ObjectId("..."), title: "Database Design", tags: ["database", "mongodb", "schema"] }

// Query that uses multikey index db.posts.find({ tags: "mongodb" }) db.posts.find({ tags: { $in: ["database", "nosql"] } })

// Compound multikey index (max one array field) db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid // db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays

Text Indexes:

// Create text index for full-text search db.articles.createIndex({ title: "text", content: "text" })

// Text search query db.articles.find({ $text: { $search: "database design patterns" } })

// Search with relevance score db.articles.find( { $text: { $search: "database design" } }, { score: { $meta: "textScore" } } ).sort({ score: { $meta: "textScore" } })

// Weighted text index (prioritize title over content) db.articles.createIndex( { title: "text", content: "text" }, { weights: { title: 10, content: 5 } } )

Geospatial Indexes:

// 2dsphere index for geographic queries db.locations.createIndex({ coordinates: "2dsphere" })

// Document format { name: "Coffee Shop", coordinates: { type: "Point", coordinates: [-122.4194, 37.7749] // [longitude, latitude] } }

// Find locations near a point db.locations.find({ coordinates: { $near: { $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] }, $maxDistance: 1000 // meters } } })

Index Properties:

// TTL Index (auto-delete documents after time) db.sessions.createIndex( { created_at: 1 }, { expireAfterSeconds: 3600 } // 1 hour )

// Partial Index (index subset of documents) db.orders.createIndex( { status: 1, created_at: -1 }, { partialFilterExpression: { status: { $eq: "pending" } } } )

// Case-insensitive index db.users.createIndex( { email: 1 }, { collation: { locale: "en", strength: 2 } } )

// Background index creation (doesn't block operations) db.large_collection.createIndex( { field: 1 }, { background: true } )

Index Analysis:

// Explain query execution db.posts.find({ author_id: "123" }).explain("executionStats")

// Check index usage db.posts.aggregate([ { $indexStats: {} } ])

// List all indexes on collection db.posts.getIndexes()

// Drop unused index db.posts.dropIndex("index_name")

Transactions

PostgreSQL Transaction Management

Basic Transactions:

-- Explicit transaction BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT; -- or ROLLBACK; to cancel changes

Savepoints (Partial Rollback):

BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...'); -- Oops, error in audit log

ROLLBACK TO SAVEPOINT before_audit; -- Inventory update preserved, audit insert rolled back

-- Fix and retry INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;

Isolation Levels:

-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed) -- Read Committed (default) - sees only committed data SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read - sees snapshot at transaction start BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000 -- Another transaction updates balance to 1500 and commits SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read) COMMIT;

-- Serializable - strictest isolation, prevents all anomalies BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- If concurrent transactions would violate serializability, one aborts COMMIT;

Advisory Locks (Application-Level Locking):

-- Exclusive lock on arbitrary number SELECT pg_advisory_lock(12345); -- ... perform critical operation ... SELECT pg_advisory_unlock(12345);

-- Try lock (non-blocking) SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise

-- Session-level advisory lock (auto-released on disconnect) SELECT pg_advisory_lock(user_id);

Row-Level Locking:

-- SELECT FOR UPDATE - lock rows for update BEGIN;

SELECT * FROM products WHERE id = 123 FOR UPDATE; -- Locks this row

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes) SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - skip locked rows instead of waiting SELECT * FROM queue WHERE processed = false ORDER BY priority LIMIT 10 FOR UPDATE SKIP LOCKED;

MongoDB Transactions

Multi-Document Transactions:

// Transactions require replica set or sharded cluster const session = db.getMongo().startSession() session.startTransaction()

try { const accountsCol = session.getDatabase("mydb").accounts

// Debit account
accountsCol.updateOne(
    { _id: "account1" },
    { $inc: { balance: -100 } },
    { session }
)

// Credit account
accountsCol.updateOne(
    { _id: "account2" },
    { $inc: { balance: 100 } },
    { session }
)

// Commit transaction
session.commitTransaction()

} catch (error) { // Abort on error session.abortTransaction() throw error } finally { session.endSession() }

Read and Write Concerns:

// Write Concern: Acknowledgment level db.orders.insertOne( { customer_id: "123", items: [...] }, { writeConcern: { w: "majority", // Wait for majority of replica set j: true, // Wait for journal write wtimeout: 5000 // Timeout after 5 seconds } } )

// Read Concern: Data consistency level db.orders.find( { status: "pending" } ).readConcern("majority") // Only return data acknowledged by majority

// Read Preference: Which replica to read from db.orders.find({ ... }).readPref("secondary") // Read from secondary replica

Atomic Operations (Single Document):

// Single document updates are atomic by default db.counters.updateOne( { _id: "page_views" }, { $inc: { count: 1 }, $set: { last_updated: new Date() } } )

// Atomic array operations db.posts.updateOne( { _id: ObjectId("...") }, { $push: { comments: { $each: [{ author: "John", text: "Great!" }], $position: 0 // Insert at beginning } } } )

// Find and modify (atomic read-modify-write) db.queue.findOneAndUpdate( { status: "pending" }, { $set: { status: "processing", processor_id: "worker-1" } }, { sort: { priority: -1 }, returnDocument: "after" // Return updated document } )

Replication

PostgreSQL Replication

Streaming Replication (Primary-Standby):

-- Primary server configuration (postgresql.conf) wal_level = replica max_wal_senders = 10 wal_keep_size = '1GB' hot_standby = on

-- Create replication user CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- pg_hba.conf on primary host replication replicator standby_ip/32 md5

-- Standby server (recovery.conf or postgresql.auto.conf) primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...' restore_command = 'cp /var/lib/postgresql/archive/%f %p'

Logical Replication (Selective Replication):

-- On publisher (source) CREATE PUBLICATION my_publication FOR TABLE users, posts; -- or FOR ALL TABLES;

-- On subscriber (destination) CREATE SUBSCRIPTION my_subscription CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...' PUBLICATION my_publication;

-- Monitor replication SELECT * FROM pg_stat_replication; SELECT * FROM pg_replication_slots;

Failover and Promotion:

-- Promote standby to primary pg_ctl promote -D /var/lib/postgresql/data

-- Check replication lag SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;

MongoDB Replication

Replica Set Configuration:

// Initialize replica set rs.initiate({ _id: "myReplicaSet", members: [ { _id: 0, host: "mongodb1.example.com:27017", priority: 2 }, { _id: 1, host: "mongodb2.example.com:27017", priority: 1 }, { _id: 2, host: "mongodb3.example.com:27017", priority: 1 } ] })

// Add member to existing replica set rs.add("mongodb4.example.com:27017")

// Remove member rs.remove("mongodb4.example.com:27017")

// Check replica set status rs.status()

// Check replication lag rs.printSecondaryReplicationInfo()

Replica Set Roles:

// Priority 0 member (cannot become primary) rs.add({ host: "analytics.example.com:27017", priority: 0, hidden: true // Hidden from application drivers })

// Arbiter (voting only, no data) rs.addArb("arbiter.example.com:27017")

// Delayed member (disaster recovery) rs.add({ host: "delayed.example.com:27017", priority: 0, hidden: true, slaveDelay: 3600 // 1 hour behind })

Read Preference Configuration:

// Application connection with read preference const client = new MongoClient(uri, { readPreference: "secondaryPreferred", // Try secondary, fallback to primary readConcernLevel: "majority" })

// Read Preference Modes: // - primary (default): Read from primary only // - primaryPreferred: Primary if available, else secondary // - secondary: Read from secondary only // - secondaryPreferred: Secondary if available, else primary // - nearest: Read from nearest member (lowest latency)

Sharding

MongoDB Sharding Architecture

Shard Key Selection:

// Good shard key characteristics: // 1. High cardinality (many distinct values) // 2. Even distribution // 3. Query isolation (queries target specific shards)

// Example: User-based application sh.shardCollection("mydb.users", { user_id: "hashed" })

// Hashed shard key: Even distribution, random data location sh.shardCollection("mydb.events", { event_id: "hashed" })

// Range-based shard key: Ordered data, good for range queries sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

// Compound shard key sh.shardCollection("mydb.orders", { customer_region: 1, // Coarse grouping order_date: 1 // Fine grouping })

Sharding Setup:

// 1. Start config servers (replica set) mongod --configsvr --replSet configRS --port 27019

// 2. Initialize config server replica set rs.initiate({ _id: "configRS", configsvr: true, members: [ { _id: 0, host: "cfg1.example.com:27019" }, { _id: 1, host: "cfg2.example.com:27019" }, { _id: 2, host: "cfg3.example.com:27019" } ] })

// 3. Start shard servers (each is a replica set) mongod --shardsvr --replSet shard1RS --port 27018

// 4. Start mongos (query router) mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

// 5. Add shards to cluster sh.addShard("shard1RS/shard1-a.example.com:27018") sh.addShard("shard2RS/shard2-a.example.com:27018")

// 6. Enable sharding on database sh.enableSharding("mydb")

// 7. Shard collections sh.shardCollection("mydb.users", { user_id: "hashed" })

Query Targeting:

// Targeted query (includes shard key) db.users.find({ user_id: "12345" }) // Routes to single shard

// Scatter-gather query (no shard key) db.users.find({ email: "user@example.com" }) // Queries all shards, merges results

// Check query targeting db.users.find({ user_id: "12345" }).explain() // Look for "SINGLE_SHARD" vs "ALL_SHARDS"

Zone Sharding (Geographic Distribution):

// Define zones for geographic sharding sh.addShardToZone("shard1", "US") sh.addShardToZone("shard2", "EU")

// Define zone ranges sh.updateZoneKeyRange( "mydb.users", { region: "US", user_id: MinKey }, { region: "US", user_id: MaxKey }, "US" )

sh.updateZoneKeyRange( "mydb.users", { region: "EU", user_id: MinKey }, { region: "EU", user_id: MaxKey }, "EU" )

// Shard collection with zone-aware key sh.shardCollection("mydb.users", { region: 1, user_id: 1 })

PostgreSQL Horizontal Partitioning

Declarative Partitioning:

-- Range partitioning CREATE TABLE logs ( id BIGSERIAL, log_time TIMESTAMP NOT NULL, message TEXT, level VARCHAR(10) ) PARTITION BY RANGE (log_time);

-- Create partitions CREATE TABLE logs_2025_01 PARTITION OF logs FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- List partitioning CREATE TABLE customers ( id SERIAL, name VARCHAR(255), region VARCHAR(50) ) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- Hash partitioning CREATE TABLE events ( id BIGSERIAL, event_type VARCHAR(50), data JSONB ) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1); -- ... events_2 and events_3

Partition Pruning (Query Optimization):

-- Query automatically uses only relevant partition SELECT * FROM logs WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20'; -- Only scans logs_2025_01 partition

-- Check query plan EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01'; -- Shows which partitions are scanned

Performance Tuning

Query Optimization Techniques

PostgreSQL Query Analysis:

-- Basic explain EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Analyze with actual execution statistics EXPLAIN ANALYZE SELECT u.username, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.active = true GROUP BY u.id, u.username ORDER BY post_count DESC LIMIT 10;

-- Identify slow queries SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;

-- Table statistics ANALYZE users; -- Update query planner statistics

-- Vacuum and analyze VACUUM ANALYZE posts; -- Reclaim space and update stats

Common Query Patterns:

-- Avoid SELECT * (retrieve only needed columns) -- BAD SELECT * FROM users WHERE id = 123;

-- GOOD SELECT id, username, email FROM users WHERE id = 123;

-- Use EXISTS instead of IN for large subqueries -- BAD SELECT * FROM posts WHERE author_id IN ( SELECT id FROM users WHERE active = true );

-- GOOD SELECT * FROM posts p WHERE EXISTS ( SELECT 1 FROM users u WHERE u.id = p.author_id AND u.active = true );

-- Use JOINs instead of multiple queries -- BAD (N+1 query problem) -- SELECT * FROM posts; -- Then for each post: SELECT * FROM users WHERE id = post.author_id;

-- GOOD SELECT p.*, u.username, u.email FROM posts p JOIN users u ON p.author_id = u.id;

-- Window functions instead of self-joins -- Calculate running total SELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) as running_total FROM orders;

-- Rank within groups SELECT category, product_name, sales, RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category FROM products;

MongoDB Query Optimization:

// Use projection to limit returned fields // BAD db.users.find({ active: true })

// GOOD db.users.find( { active: true }, { username: 1, email: 1, _id: 0 } )

// Use covered queries (index covers all fields) db.users.createIndex({ username: 1, email: 1 }) db.users.find( { username: "john_doe" }, { username: 1, email: 1, _id: 0 } ) // Entire query served from index

// Avoid negation operators // BAD (cannot use index efficiently) db.products.find({ status: { $ne: "discontinued" } })

// GOOD db.products.find({ status: { $in: ["active", "pending", "sold"] } })

// Use $lookup sparingly (expensive operation) // Consider embedding data instead if appropriate

// Aggregation optimization: Filter early // BAD db.orders.aggregate([ { $lookup: { ... } }, // Expensive join { $match: { status: "completed" } } // Filter after join ])

// GOOD db.orders.aggregate([ { $match: { status: "completed" } }, // Filter first { $lookup: { ... } } // Join fewer documents ])

Connection Pooling

PostgreSQL Connection Pooling:

// Using node-postgres (pg) with pool const { Pool } = require('pg')

const pool = new Pool({ host: 'localhost', port: 5432, database: 'mydb', user: 'dbuser', password: 'secret', max: 20, // Maximum pool size idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000 })

// Execute query const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

// Use PgBouncer for server-side pooling // pgbouncer.ini // [databases] // mydb = host=localhost port=5432 dbname=mydb // // [pgbouncer] // pool_mode = transaction // max_client_conn = 1000 // default_pool_size = 25

MongoDB Connection Pooling:

// MongoClient automatically manages connection pool const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, { maxPoolSize: 50, // Max connections minPoolSize: 10, // Min connections maxIdleTimeMS: 30000, // Close idle connections waitQueueTimeoutMS: 5000 // Wait for available connection })

await client.connect() const db = client.db('mydb') // Connection automatically returned to pool after use

Best Practices

PostgreSQL Best Practices

Schema Design

Normalize for data integrity, denormalize for performance
Use appropriate data types (avoid TEXT for short strings)
Define NOT NULL constraints where appropriate
Use SERIAL or UUID for primary keys consistently

Indexing

Index foreign keys for JOIN performance
Create indexes on frequently filtered/sorted columns
Use partial indexes for selective queries
Monitor and remove unused indexes
Keep composite index column count reasonable (typically ≤ 3-4)

Query Performance

Use EXPLAIN ANALYZE to understand query plans
Avoid SELECT * in application code
Use prepared statements to prevent SQL injection
Limit result sets with LIMIT
Use connection pooling

Maintenance

Run VACUUM regularly (or enable autovacuum)
Update statistics with ANALYZE
Monitor slow query log
Set appropriate autovacuum thresholds
Regular backup with pg_dump or WAL archiving

Security

Use SSL/TLS for connections
Implement row-level security for multi-tenant apps
Grant minimum necessary privileges
Use parameterized queries
Regular security updates

MongoDB Best Practices

Schema Design

Embed related data that is accessed together
Reference data that is large or rarely accessed
Use polymorphic pattern for varied schemas
Limit document size to reasonable bounds (< 1-2 MB typically)
Design for your query patterns

Indexing

Index on fields used in queries and sorts
Use compound indexes with ESR rule (Equality, Sort, Range)
Create text indexes for full-text search
Monitor index usage with $indexStats
Avoid too many indexes (write performance impact)

Query Performance

Use projection to limit returned fields
Create covered queries when possible
Filter early in aggregation pipelines
Avoid $lookup when embedding is appropriate
Use explain() to verify index usage

Scalability

Choose appropriate shard key (high cardinality, even distribution)
Use replica sets for high availability
Configure appropriate read/write concerns
Monitor chunk distribution in sharded clusters
Use zones for geographic distribution

Operations

Enable authentication and authorization
Use TLS for client connections
Regular backups (mongodump or filesystem snapshots)
Monitor with MongoDB Atlas, Ops Manager, or custom tools
Keep MongoDB version updated

Data Modeling Decision Framework

Choose PostgreSQL when:

Strong ACID guarantees required (financial transactions)
Complex relationships with many JOINs
Data structure is well-defined and stable
Need for advanced SQL features (window functions, CTEs, stored procedures)
Compliance requirements demand strict consistency

Choose MongoDB when:

Schema flexibility needed (rapid development, evolving requirements)
Horizontal scalability is priority (sharding required)
Document-oriented data (JSON/BSON native format)
Hierarchical or nested data structures
High write throughput with eventual consistency acceptable

Hybrid Approach:

Use both databases for different parts of application
PostgreSQL for transactional data (orders, payments)
MongoDB for catalog, logs, user sessions
Synchronize critical data between systems

Common Patterns and Anti-Patterns

PostgreSQL Anti-Patterns

❌ Storing JSON when relational fits better

-- BAD: Using JSONB for structured, queryable data CREATE TABLE users ( id SERIAL PRIMARY KEY, data JSONB -- { name, email, address: { street, city, state } } );

-- GOOD: Proper normalization CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255) );

CREATE TABLE addresses ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id), street VARCHAR(255), city VARCHAR(100), state VARCHAR(50) );

❌ Over-indexing

-- BAD: Index on every column "just in case" CREATE INDEX idx1 ON users(username); CREATE INDEX idx2 ON users(email); CREATE INDEX idx3 ON users(created_at); CREATE INDEX idx4 ON users(updated_at); CREATE INDEX idx5 ON users(active); -- Result: Slow writes, large database size

-- GOOD: Index based on actual query patterns CREATE INDEX idx_users_email ON users(email); -- Login queries CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial

❌ N+1 Query Problem

-- BAD: Multiple queries in loop SELECT * FROM posts; -- Returns 100 posts -- Then for each post: SELECT * FROM users WHERE id = ?; -- 100 additional queries!

-- GOOD: Single query with JOIN SELECT p.*, u.username, u.email FROM posts p JOIN users u ON p.author_id = u.id;

MongoDB Anti-Patterns

❌ Massive arrays in documents

// BAD: Unbounded array growth { _id: ObjectId("..."), username: "popular_user", followers: [ ObjectId("follower1"), ObjectId("follower2"), // ... 100,000+ follower IDs // Document exceeds 16MB limit! ] }

// GOOD: Separate collection with references // users collection { _id: ObjectId("..."), username: "popular_user" }

// followers collection { _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") } db.followers.createIndex({ user_id: 1, follower_id: 1 })

❌ Poor shard key selection

// BAD: Monotonically increasing shard key sh.shardCollection("mydb.events", { _id: 1 }) // All writes go to same shard (highest _id range)

// BAD: Low cardinality shard key sh.shardCollection("mydb.users", { country: 1 }) // Most users in few countries = uneven distribution

// GOOD: Hashed _id or compound key sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound

❌ Ignoring indexes on embedded documents

// Document structure { username: "john_doe", profile: { email: "john@example.com", age: 30, city: "San Francisco" } }

// Query on embedded field db.users.find({ "profile.email": "john@example.com" })

// MISSING: Index on embedded field db.users.createIndex({ "profile.email": 1 })

Troubleshooting Guide

PostgreSQL Issues

Slow Queries:

-- Enable slow query logging (postgresql.conf) -- log_min_duration_statement = 1000 # Log queries > 1 second

-- Find slow queries SELECT query, calls, total_exec_time / calls as avg_time_ms, rows / calls as avg_rows FROM pg_stat_statements WHERE calls > 100 ORDER BY total_exec_time DESC LIMIT 20;

-- Analyze specific slow query EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ... FROM ... WHERE ...;

High CPU Usage:

-- Check running queries SELECT pid, now() - query_start as duration, state, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;

-- Terminate long-running query SELECT pg_terminate_backend(pid);

Lock Contention:

-- View locks SELECT locktype, relation::regclass, mode, granted, pid FROM pg_locks WHERE NOT granted;

-- Find blocking queries SELECT blocked_locks.pid AS blocked_pid, blocking_locks.pid AS blocking_pid, blocked_activity.query AS blocked_query, blocking_activity.query AS blocking_query FROM pg_locks blocked_locks JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted AND blocking_locks.granted;

MongoDB Issues

Slow Queries:

// Enable profiling db.setProfilingLevel(1, { slowms: 100 }) // Log queries > 100ms

// View slow queries db.system.profile.find().sort({ ts: -1 }).limit(10)

// Analyze query performance db.collection.find({ ... }).explain("executionStats") // Check: totalDocsExamined vs nReturned (should be close) // Check: executionTimeMillis // Check: indexName (should show index usage)

Replication Lag:

// Check lag on secondary rs.printSecondaryReplicationInfo()

// Check oplog size db.getReplicationInfo()

// Increase oplog size if needed db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB

Sharding Issues:

// Check chunk distribution sh.status()

// Check balancer status sh.getBalancerState() sh.isBalancerRunning()

// Balance specific collection sh.enableBalancing("mydb.mycollection")

// Check for jumbo chunks db.chunks.find({ jumbo: true })

Resources

PostgreSQL Resources

Official Documentation: https://www.postgresql.org/docs/
PostgreSQL Wiki: https://wiki.postgresql.org/
Performance Tuning: https://wiki.postgresql.org/wiki/Performance_Optimization
Explain Visualizer: https://explain.dalibo.com/
pg_stat_statements Extension: Essential for query analysis

MongoDB Resources

Official Documentation: https://docs.mongodb.com/
MongoDB University: Free courses and certification
Aggregation Framework: https://docs.mongodb.com/manual/aggregation/
Sharding Guide: https://docs.mongodb.com/manual/sharding/
Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary

Books

PostgreSQL: "PostgreSQL: Up and Running" by Regina Obe & Leo Hsu
MongoDB: "MongoDB: The Definitive Guide" by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

Skill Version: 1.0.0 Last Updated: January 2025 Skill Category: Database Management, Data Architecture, Performance Optimization Technologies: PostgreSQL 16+, MongoDB 7+

database-management-patterns

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

docker-compose-orchestration

postgresql-database-engineering

jest-react-testing

ui-design-patterns