Database Management Patterns
A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.
When to Use This Skill
Use this skill when:
-
Designing database schemas for new applications or refactoring existing ones
-
Choosing between SQL and NoSQL databases for your use case
-
Optimizing query performance with proper indexing strategies
-
Implementing data consistency with transactions and ACID guarantees
-
Scaling databases horizontally with sharding and replication
-
Managing high-traffic applications requiring distributed databases
-
Ensuring data integrity with constraints, triggers, and validation
-
Troubleshooting performance issues using explain plans and query analysis
-
Building fault-tolerant systems with replication and failover strategies
-
Working with complex data relationships (relational) or flexible schemas (document)
Core Concepts
Database Paradigms Comparison
Relational Databases (PostgreSQL)
Strengths:
-
ACID Transactions: Strong consistency guarantees
-
Complex Queries: JOIN operations, subqueries, CTEs
-
Data Integrity: Foreign keys, constraints, triggers
-
Normalized Data: Reduced redundancy, consistent updates
-
Mature Ecosystem: Rich tooling, extensions, community
Best For:
-
Financial systems requiring strict consistency
-
Complex relationships and data integrity requirements
-
Applications with structured, well-defined schemas
-
Systems requiring complex analytical queries
-
Multi-step transactions across multiple tables
Document Databases (MongoDB)
Strengths:
-
Flexible Schema: Easy schema evolution, polymorphic data
-
Horizontal Scalability: Built-in sharding support
-
JSON-Native: Natural fit for modern application development
-
Embedded Documents: Denormalized data for performance
-
Aggregation Framework: Powerful data processing pipeline
Best For:
-
Rapidly evolving applications with changing requirements
-
Content management systems with varied data structures
-
Real-time analytics and event logging
-
Mobile and web applications with JSON APIs
-
Hierarchical or nested data structures
ACID Properties
Atomicity: All operations in a transaction succeed or fail together Consistency: Transactions bring database from one valid state to another Isolation: Concurrent transactions don't interfere with each other Durability: Committed transactions survive system failures
CAP Theorem
In distributed systems, choose two of three:
-
Consistency: All nodes see the same data
-
Availability: System remains operational
-
Partition Tolerance: System continues despite network failures
PostgreSQL emphasizes CP (Consistency + Partition Tolerance) MongoDB can be configured for CP or AP depending on write/read concerns
PostgreSQL Patterns
Schema Design Fundamentals
Normalization Levels
First Normal Form (1NF)
-
Atomic values (no arrays or lists in columns)
-
Each row is unique (primary key exists)
-
No repeating groups
Second Normal Form (2NF)
-
Meets 1NF requirements
-
All non-key attributes depend on the entire primary key
Third Normal Form (3NF)
-
Meets 2NF requirements
-
No transitive dependencies (non-key attributes depend only on primary key)
When to Denormalize:
-
Read-heavy workloads where joins are expensive
-
Frequently accessed aggregate data
-
Historical snapshots that shouldn't change
-
Performance-critical queries
Table Design Patterns
Primary Keys:
-- Serial auto-increment (traditional) CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
-- UUID for distributed systems CREATE TABLE accounts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
-- Composite primary key CREATE TABLE order_items ( order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL, price NUMERIC(10, 2) NOT NULL, PRIMARY KEY (order_id, product_id), FOREIGN KEY (order_id) REFERENCES orders(id), FOREIGN KEY (product_id) REFERENCES products(id) );
Foreign Key Constraints:
-- Cascade delete: Remove child records when parent deleted CREATE TABLE posts ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL, title VARCHAR(255) NOT NULL, content TEXT, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE );
-- Set null: Preserve child records, nullify reference CREATE TABLE comments ( id SERIAL PRIMARY KEY, post_id INTEGER, user_id INTEGER, content TEXT NOT NULL, FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL );
-- Restrict: Prevent deletion if child records exist CREATE TABLE categories ( id SERIAL PRIMARY KEY, name VARCHAR(255) UNIQUE NOT NULL );
CREATE TABLE products ( id SERIAL PRIMARY KEY, category_id INTEGER NOT NULL, name VARCHAR(255) NOT NULL, FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT );
Advanced Constraints
Check Constraints:
CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price NUMERIC(10, 2) NOT NULL CHECK (price > 0), discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100), stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0) );
-- Table-level check constraint CREATE TABLE date_ranges ( id SERIAL PRIMARY KEY, start_date DATE NOT NULL, end_date DATE NOT NULL, CHECK (end_date > start_date) );
Unique Constraints:
-- Single column unique CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL );
-- Composite unique constraint CREATE TABLE user_permissions ( user_id INTEGER NOT NULL, permission_id INTEGER NOT NULL, granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE (user_id, permission_id), FOREIGN KEY (user_id) REFERENCES users(id), FOREIGN KEY (permission_id) REFERENCES permissions(id) );
-- Partial unique index (unique where condition met) CREATE UNIQUE INDEX unique_active_email ON users (email) WHERE active = true;
Triggers and Functions
Audit Trail Pattern:
-- Audit table CREATE TABLE audit_log ( id SERIAL PRIMARY KEY, table_name VARCHAR(255) NOT NULL, record_id INTEGER NOT NULL, action VARCHAR(10) NOT NULL, old_data JSONB, new_data JSONB, changed_by VARCHAR(255), changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
-- Trigger function CREATE OR REPLACE FUNCTION audit_trigger_function() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'INSERT' THEN INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by) VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user); RETURN NEW; ELSIF TG_OP = 'UPDATE' THEN INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by) VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user); RETURN NEW; ELSIF TG_OP = 'DELETE' THEN INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by) VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user); RETURN OLD; END IF; END; $$ LANGUAGE plpgsql;
-- Attach trigger to table CREATE TRIGGER users_audit_trigger AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();
Timestamp Update Pattern:
CREATE OR REPLACE FUNCTION update_modified_timestamp() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = CURRENT_TIMESTAMP; RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TABLE posts ( id SERIAL PRIMARY KEY, title VARCHAR(255) NOT NULL, content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
CREATE TRIGGER posts_update_timestamp BEFORE UPDATE ON posts FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();
Views and Materialized Views
Standard Views:
-- Virtual table - computed on each query CREATE VIEW active_users_with_posts AS SELECT u.id, u.username, u.email, COUNT(p.id) as post_count, MAX(p.created_at) as last_post_date FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.active = true GROUP BY u.id, u.username, u.email;
-- Use view like a table SELECT * FROM active_users_with_posts WHERE post_count > 10;
Materialized Views:
-- Physical table - stores computed results CREATE MATERIALIZED VIEW user_statistics AS SELECT u.id, u.username, COUNT(DISTINCT p.id) as total_posts, COUNT(DISTINCT c.id) as total_comments, AVG(p.views) as avg_post_views, MAX(p.created_at) as last_activity FROM users u LEFT JOIN posts p ON u.id = p.user_id LEFT JOIN comments c ON u.id = c.user_id GROUP BY u.id, u.username;
-- Create index on materialized view CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);
-- Refresh materialized view (update data) REFRESH MATERIALIZED VIEW user_statistics;
-- Concurrent refresh (allows reads during refresh) REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;
MongoDB Patterns
Document Modeling Strategies
Embedding vs Referencing
Embedding Pattern (Denormalization):
// One-to-few: Embed when relationship is contained // Example: Blog post with comments { _id: ObjectId("..."), title: "Database Design Patterns", author: "John Doe", content: "...", published_at: ISODate("2025-01-15"), comments: [ { _id: ObjectId("..."), author: "Jane Smith", text: "Great article!", created_at: ISODate("2025-01-16") }, { _id: ObjectId("..."), author: "Bob Johnson", text: "Very helpful, thanks!", created_at: ISODate("2025-01-17") } ], tags: ["database", "design", "patterns"], stats: { views: 1523, likes: 89, shares: 23 } }
// Benefits: // - Single query to retrieve post with comments // - Better read performance // - Atomic updates to entire document // // Drawbacks: // - Document size limits (16MB in MongoDB) // - Difficult to query comments independently // - May duplicate data if comments need to appear elsewhere
Referencing Pattern (Normalization):
// One-to-many or many-to-many: Reference when relationship is unbounded // Example: User with many posts
// Users collection { _id: ObjectId("507f1f77bcf86cd799439011"), username: "john_doe", email: "john@example.com", profile: { bio: "Software engineer", avatar_url: "https://...", location: "San Francisco" }, created_at: ISODate("2024-01-01") }
// Posts collection (references user) { _id: ObjectId("507f191e810c19729de860ea"), user_id: ObjectId("507f1f77bcf86cd799439011"), title: "My First Post", content: "...", published_at: ISODate("2025-01-15"), comment_ids: [ ObjectId("..."), ObjectId("...") ] }
// Benefits: // - No duplication of user data // - Flexible: users can have unlimited posts // - Easy to update user information once // // Drawbacks: // - Requires multiple queries or $lookup // - Slower read performance for joined data
Hybrid Approach (Selective Denormalization):
// Store frequently accessed fields from referenced document { _id: ObjectId("..."), title: "Database Patterns", content: "...", author: { // Embedded: frequently accessed, rarely changes id: ObjectId("507f1f77bcf86cd799439011"), username: "john_doe", avatar_url: "https://..." }, // Reference: full user data available if needed author_id: ObjectId("507f1f77bcf86cd799439011"), published_at: ISODate("2025-01-15") }
// Benefits: // - Fast reads with embedded frequently-used data // - Can still get full user data when needed // - Balance between performance and flexibility // // Tradeoffs: // - Need to update embedded data when user changes username/avatar // - Slightly larger documents
Schema Design Patterns
Bucket Pattern (Time-Series Data):
// Instead of one document per measurement: // BAD: Millions of tiny documents { sensor_id: "sensor_001", timestamp: ISODate("2025-01-15T10:00:00Z"), temperature: 72.5, humidity: 45 }
// GOOD: Bucket documents with arrays of measurements { sensor_id: "sensor_001", date: ISODate("2025-01-15"), hour: 10, measurements: [ { minute: 0, temperature: 72.5, humidity: 45 }, { minute: 1, temperature: 72.6, humidity: 45 }, { minute: 2, temperature: 72.4, humidity: 46 }, // ... up to 60 measurements per hour ], summary: { count: 60, avg_temperature: 72.5, min_temperature: 71.8, max_temperature: 73.2 } }
// Benefits: // - Reduced document count (60x fewer documents) // - Better index efficiency // - Pre-computed summaries // - Easier to query by time ranges
Computed Pattern (Pre-Aggregated Data):
// Store computed values to avoid expensive aggregations { _id: ObjectId("..."), product_id: "PROD-123", month: "2025-01", total_sales: 15420.50, units_sold: 234, unique_customers: 187, avg_order_value: 65.90, top_customers: [ { customer_id: "CUST-456", revenue: 890.50 }, { customer_id: "CUST-789", revenue: 675.25 } ], computed_at: ISODate("2025-02-01T00:00:00Z") }
// Update pattern: Scheduled job or trigger updates computed values
Polymorphic Pattern (Varied Schemas):
// Handle different product types in single collection { _id: ObjectId("..."), type: "book", name: "Database Design", price: 49.99, // Book-specific fields isbn: "978-0-123456-78-9", author: "John Smith", pages: 456, publisher: "Tech Books Inc" }
{ _id: ObjectId("..."), type: "electronics", name: "Wireless Mouse", price: 29.99, // Electronics-specific fields brand: "TechBrand", warranty_months: 24, specifications: { battery_life: "6 months", connectivity: "Bluetooth 5.0" } }
// Query by type db.products.find({ type: "book", author: "John Smith" }) db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })
Aggregation Framework
Basic Aggregation Pipeline:
// Group by author and count posts db.posts.aggregate([ { $match: { published: true } // Filter stage }, { $group: { _id: "$author_id", total_posts: { $sum: 1 }, total_views: { $sum: "$views" }, avg_views: { $avg: "$views" }, latest_post: { $max: "$published_at" } } }, { $sort: { total_posts: -1 } // Sort by post count }, { $limit: 10 // Top 10 authors } ])
Advanced Pipeline with Lookup (Join):
// Join posts with user data db.posts.aggregate([ { $match: { published_at: { $gte: ISODate("2025-01-01") } } }, { $lookup: { from: "users", localField: "author_id", foreignField: "_id", as: "author" } }, { $unwind: "$author" // Flatten author array }, { $project: { title: 1, content: 1, views: 1, "author.username": 1, "author.email": 1, days_since_publish: { $divide: [ { $subtract: [new Date(), "$published_at"] }, 1000 * 60 * 60 * 24 ] } } }, { $sort: { views: -1 } } ])
Aggregation with Grouping and Reshaping:
// Complex aggregation: Sales analysis db.orders.aggregate([ { $match: { status: "completed", created_at: { $gte: ISODate("2025-01-01"), $lt: ISODate("2025-02-01") } } }, { $unwind: "$items" // Flatten order items }, { $group: { _id: { product_id: "$items.product_id", customer_region: "$customer.region" }, total_quantity: { $sum: "$items.quantity" }, total_revenue: { $sum: "$items.total_price" }, order_count: { $sum: 1 }, avg_order_value: { $avg: "$items.total_price" } } }, { $group: { _id: "$_id.product_id", regions: { $push: { region: "$_id.customer_region", quantity: "$total_quantity", revenue: "$total_revenue" } }, total_quantity: { $sum: "$total_quantity" }, total_revenue: { $sum: "$total_revenue" } } }, { $sort: { total_revenue: -1 } } ])
Indexing Strategies
PostgreSQL Indexes
B-tree Indexes (Default):
-- Single column index CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!) CREATE INDEX idx_posts_author_published ON posts(author_id, published_at);
-- Query can use index: -- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at; -- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';
-- Query CANNOT fully use index: -- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)
Partial Indexes:
-- Index only active users CREATE INDEX idx_active_users ON users(username) WHERE active = true;
-- Index only recent orders CREATE INDEX idx_recent_orders ON orders(created_at, status) WHERE created_at > '2024-01-01';
-- Benefits: Smaller index size, faster queries on filtered data
Expression Indexes:
-- Index on lowercase email for case-insensitive search CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Query that uses this index: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Index on JSONB field extraction CREATE INDEX idx_metadata_tags ON products((metadata->>'category'));
Full-Text Search Indexes:
-- Add tsvector column for full-text search ALTER TABLE articles ADD COLUMN tsv_content tsvector;
-- Populate tsvector column UPDATE articles SET tsv_content = to_tsvector('english', title || ' ' || content);
-- Create GIN index for full-text search CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);
-- Full-text search query SELECT title, ts_rank(tsv_content, query) as rank FROM articles, to_tsquery('english', 'database & design') query WHERE tsv_content @@ query ORDER BY rank DESC;
-- Trigger to auto-update tsvector CREATE TRIGGER articles_tsv_update BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);
JSONB Indexes:
-- GIN index for JSONB containment queries CREATE INDEX idx_products_metadata ON products USING GIN(metadata);
-- Queries that use this index: SELECT * FROM products WHERE metadata @> '{"color": "blue"}'; SELECT * FROM products WHERE metadata ? 'size';
-- Index on specific JSONB path CREATE INDEX idx_products_category ON products((metadata->>'category'));
Index Monitoring:
-- Find unused indexes SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;
-- Check index usage SELECT relname as table_name, indexrelname as index_name, idx_scan as times_used, idx_tup_read as tuples_read, idx_tup_fetch as tuples_fetched FROM pg_stat_user_indexes ORDER BY idx_scan ASC;
MongoDB Indexes
Single Field Indexes:
// Create index on single field db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending
// Unique index db.users.createIndex({ username: 1 }, { unique: true })
// Sparse index (only index documents with the field) db.users.createIndex({ phone_number: 1 }, { sparse: true })
Compound Indexes:
// Index on multiple fields (order matters!) db.posts.createIndex({ author_id: 1, published_at: -1 })
// Efficient queries: // - { author_id: "123" } // - { author_id: "123", published_at: { $gte: ... } } // - { author_id: "123" } with sort by published_at
// Inefficient: // - { published_at: { $gte: ... } } alone (doesn't use index efficiently)
// ESR Rule: Equality, Sort, Range // Best compound index order: // 1. Equality filters first // 2. Sort fields second // 3. Range filters last db.orders.createIndex({ status: 1, // Equality created_at: -1, // Sort total_amount: 1 // Range })
Multikey Indexes (Array Fields):
// Index on array field db.posts.createIndex({ tags: 1 })
// Document with array { _id: ObjectId("..."), title: "Database Design", tags: ["database", "mongodb", "schema"] }
// Query that uses multikey index db.posts.find({ tags: "mongodb" }) db.posts.find({ tags: { $in: ["database", "nosql"] } })
// Compound multikey index (max one array field) db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid // db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays
Text Indexes:
// Create text index for full-text search db.articles.createIndex({ title: "text", content: "text" })
// Text search query db.articles.find({ $text: { $search: "database design patterns" } })
// Search with relevance score db.articles.find( { $text: { $search: "database design" } }, { score: { $meta: "textScore" } } ).sort({ score: { $meta: "textScore" } })
// Weighted text index (prioritize title over content) db.articles.createIndex( { title: "text", content: "text" }, { weights: { title: 10, content: 5 } } )
Geospatial Indexes:
// 2dsphere index for geographic queries db.locations.createIndex({ coordinates: "2dsphere" })
// Document format { name: "Coffee Shop", coordinates: { type: "Point", coordinates: [-122.4194, 37.7749] // [longitude, latitude] } }
// Find locations near a point db.locations.find({ coordinates: { $near: { $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] }, $maxDistance: 1000 // meters } } })
Index Properties:
// TTL Index (auto-delete documents after time) db.sessions.createIndex( { created_at: 1 }, { expireAfterSeconds: 3600 } // 1 hour )
// Partial Index (index subset of documents) db.orders.createIndex( { status: 1, created_at: -1 }, { partialFilterExpression: { status: { $eq: "pending" } } } )
// Case-insensitive index db.users.createIndex( { email: 1 }, { collation: { locale: "en", strength: 2 } } )
// Background index creation (doesn't block operations) db.large_collection.createIndex( { field: 1 }, { background: true } )
Index Analysis:
// Explain query execution db.posts.find({ author_id: "123" }).explain("executionStats")
// Check index usage db.posts.aggregate([ { $indexStats: {} } ])
// List all indexes on collection db.posts.getIndexes()
// Drop unused index db.posts.dropIndex("index_name")
Transactions
PostgreSQL Transaction Management
Basic Transactions:
-- Explicit transaction BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- or ROLLBACK; to cancel changes
Savepoints (Partial Rollback):
BEGIN;
UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';
SAVEPOINT before_audit;
INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...'); -- Oops, error in audit log
ROLLBACK TO SAVEPOINT before_audit; -- Inventory update preserved, audit insert rolled back
-- Fix and retry INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');
COMMIT;
Isolation Levels:
-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed) -- Read Committed (default) - sees only committed data SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Repeatable Read - sees snapshot at transaction start BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000 -- Another transaction updates balance to 1500 and commits SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read) COMMIT;
-- Serializable - strictest isolation, prevents all anomalies BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- If concurrent transactions would violate serializability, one aborts COMMIT;
Advisory Locks (Application-Level Locking):
-- Exclusive lock on arbitrary number SELECT pg_advisory_lock(12345); -- ... perform critical operation ... SELECT pg_advisory_unlock(12345);
-- Try lock (non-blocking) SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise
-- Session-level advisory lock (auto-released on disconnect) SELECT pg_advisory_lock(user_id);
Row-Level Locking:
-- SELECT FOR UPDATE - lock rows for update BEGIN;
SELECT * FROM products WHERE id = 123 FOR UPDATE; -- Locks this row
UPDATE products SET quantity = quantity - 1 WHERE id = 123;
COMMIT;
-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes) SELECT * FROM products WHERE id = 123 FOR SHARE;
-- SKIP LOCKED - skip locked rows instead of waiting SELECT * FROM queue WHERE processed = false ORDER BY priority LIMIT 10 FOR UPDATE SKIP LOCKED;
MongoDB Transactions
Multi-Document Transactions:
// Transactions require replica set or sharded cluster const session = db.getMongo().startSession() session.startTransaction()
try { const accountsCol = session.getDatabase("mydb").accounts
// Debit account
accountsCol.updateOne(
{ _id: "account1" },
{ $inc: { balance: -100 } },
{ session }
)
// Credit account
accountsCol.updateOne(
{ _id: "account2" },
{ $inc: { balance: 100 } },
{ session }
)
// Commit transaction
session.commitTransaction()
} catch (error) { // Abort on error session.abortTransaction() throw error } finally { session.endSession() }
Read and Write Concerns:
// Write Concern: Acknowledgment level db.orders.insertOne( { customer_id: "123", items: [...] }, { writeConcern: { w: "majority", // Wait for majority of replica set j: true, // Wait for journal write wtimeout: 5000 // Timeout after 5 seconds } } )
// Read Concern: Data consistency level db.orders.find( { status: "pending" } ).readConcern("majority") // Only return data acknowledged by majority
// Read Preference: Which replica to read from db.orders.find({ ... }).readPref("secondary") // Read from secondary replica
Atomic Operations (Single Document):
// Single document updates are atomic by default db.counters.updateOne( { _id: "page_views" }, { $inc: { count: 1 }, $set: { last_updated: new Date() } } )
// Atomic array operations db.posts.updateOne( { _id: ObjectId("...") }, { $push: { comments: { $each: [{ author: "John", text: "Great!" }], $position: 0 // Insert at beginning } } } )
// Find and modify (atomic read-modify-write) db.queue.findOneAndUpdate( { status: "pending" }, { $set: { status: "processing", processor_id: "worker-1" } }, { sort: { priority: -1 }, returnDocument: "after" // Return updated document } )
Replication
PostgreSQL Replication
Streaming Replication (Primary-Standby):
-- Primary server configuration (postgresql.conf) wal_level = replica max_wal_senders = 10 wal_keep_size = '1GB' hot_standby = on
-- Create replication user CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
-- pg_hba.conf on primary host replication replicator standby_ip/32 md5
-- Standby server (recovery.conf or postgresql.auto.conf) primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...' restore_command = 'cp /var/lib/postgresql/archive/%f %p'
Logical Replication (Selective Replication):
-- On publisher (source) CREATE PUBLICATION my_publication FOR TABLE users, posts; -- or FOR ALL TABLES;
-- On subscriber (destination) CREATE SUBSCRIPTION my_subscription CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...' PUBLICATION my_publication;
-- Monitor replication SELECT * FROM pg_stat_replication; SELECT * FROM pg_replication_slots;
Failover and Promotion:
-- Promote standby to primary pg_ctl promote -D /var/lib/postgresql/data
-- Check replication lag SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;
MongoDB Replication
Replica Set Configuration:
// Initialize replica set rs.initiate({ _id: "myReplicaSet", members: [ { _id: 0, host: "mongodb1.example.com:27017", priority: 2 }, { _id: 1, host: "mongodb2.example.com:27017", priority: 1 }, { _id: 2, host: "mongodb3.example.com:27017", priority: 1 } ] })
// Add member to existing replica set rs.add("mongodb4.example.com:27017")
// Remove member rs.remove("mongodb4.example.com:27017")
// Check replica set status rs.status()
// Check replication lag rs.printSecondaryReplicationInfo()
Replica Set Roles:
// Priority 0 member (cannot become primary) rs.add({ host: "analytics.example.com:27017", priority: 0, hidden: true // Hidden from application drivers })
// Arbiter (voting only, no data) rs.addArb("arbiter.example.com:27017")
// Delayed member (disaster recovery) rs.add({ host: "delayed.example.com:27017", priority: 0, hidden: true, slaveDelay: 3600 // 1 hour behind })
Read Preference Configuration:
// Application connection with read preference const client = new MongoClient(uri, { readPreference: "secondaryPreferred", // Try secondary, fallback to primary readConcernLevel: "majority" })
// Read Preference Modes: // - primary (default): Read from primary only // - primaryPreferred: Primary if available, else secondary // - secondary: Read from secondary only // - secondaryPreferred: Secondary if available, else primary // - nearest: Read from nearest member (lowest latency)
Sharding
MongoDB Sharding Architecture
Shard Key Selection:
// Good shard key characteristics: // 1. High cardinality (many distinct values) // 2. Even distribution // 3. Query isolation (queries target specific shards)
// Example: User-based application sh.shardCollection("mydb.users", { user_id: "hashed" })
// Hashed shard key: Even distribution, random data location sh.shardCollection("mydb.events", { event_id: "hashed" })
// Range-based shard key: Ordered data, good for range queries sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })
// Compound shard key sh.shardCollection("mydb.orders", { customer_region: 1, // Coarse grouping order_date: 1 // Fine grouping })
Sharding Setup:
// 1. Start config servers (replica set) mongod --configsvr --replSet configRS --port 27019
// 2. Initialize config server replica set rs.initiate({ _id: "configRS", configsvr: true, members: [ { _id: 0, host: "cfg1.example.com:27019" }, { _id: 1, host: "cfg2.example.com:27019" }, { _id: 2, host: "cfg3.example.com:27019" } ] })
// 3. Start shard servers (each is a replica set) mongod --shardsvr --replSet shard1RS --port 27018
// 4. Start mongos (query router) mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019
// 5. Add shards to cluster sh.addShard("shard1RS/shard1-a.example.com:27018") sh.addShard("shard2RS/shard2-a.example.com:27018")
// 6. Enable sharding on database sh.enableSharding("mydb")
// 7. Shard collections sh.shardCollection("mydb.users", { user_id: "hashed" })
Query Targeting:
// Targeted query (includes shard key) db.users.find({ user_id: "12345" }) // Routes to single shard
// Scatter-gather query (no shard key) db.users.find({ email: "user@example.com" }) // Queries all shards, merges results
// Check query targeting db.users.find({ user_id: "12345" }).explain() // Look for "SINGLE_SHARD" vs "ALL_SHARDS"
Zone Sharding (Geographic Distribution):
// Define zones for geographic sharding sh.addShardToZone("shard1", "US") sh.addShardToZone("shard2", "EU")
// Define zone ranges sh.updateZoneKeyRange( "mydb.users", { region: "US", user_id: MinKey }, { region: "US", user_id: MaxKey }, "US" )
sh.updateZoneKeyRange( "mydb.users", { region: "EU", user_id: MinKey }, { region: "EU", user_id: MaxKey }, "EU" )
// Shard collection with zone-aware key sh.shardCollection("mydb.users", { region: 1, user_id: 1 })
PostgreSQL Horizontal Partitioning
Declarative Partitioning:
-- Range partitioning CREATE TABLE logs ( id BIGSERIAL, log_time TIMESTAMP NOT NULL, message TEXT, level VARCHAR(10) ) PARTITION BY RANGE (log_time);
-- Create partitions CREATE TABLE logs_2025_01 PARTITION OF logs FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE logs_2025_02 PARTITION OF logs FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- List partitioning CREATE TABLE customers ( id SERIAL, name VARCHAR(255), region VARCHAR(50) ) PARTITION BY LIST (region);
CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR', 'IT');
-- Hash partitioning CREATE TABLE events ( id BIGSERIAL, event_type VARCHAR(50), data JSONB ) PARTITION BY HASH (id);
CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1); -- ... events_2 and events_3
Partition Pruning (Query Optimization):
-- Query automatically uses only relevant partition SELECT * FROM logs WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20'; -- Only scans logs_2025_01 partition
-- Check query plan EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01'; -- Shows which partitions are scanned
Performance Tuning
Query Optimization Techniques
PostgreSQL Query Analysis:
-- Basic explain EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- Analyze with actual execution statistics EXPLAIN ANALYZE SELECT u.username, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.active = true GROUP BY u.id, u.username ORDER BY post_count DESC LIMIT 10;
-- Identify slow queries SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;
-- Table statistics ANALYZE users; -- Update query planner statistics
-- Vacuum and analyze VACUUM ANALYZE posts; -- Reclaim space and update stats
Common Query Patterns:
-- Avoid SELECT * (retrieve only needed columns) -- BAD SELECT * FROM users WHERE id = 123;
-- GOOD SELECT id, username, email FROM users WHERE id = 123;
-- Use EXISTS instead of IN for large subqueries -- BAD SELECT * FROM posts WHERE author_id IN ( SELECT id FROM users WHERE active = true );
-- GOOD SELECT * FROM posts p WHERE EXISTS ( SELECT 1 FROM users u WHERE u.id = p.author_id AND u.active = true );
-- Use JOINs instead of multiple queries -- BAD (N+1 query problem) -- SELECT * FROM posts; -- Then for each post: SELECT * FROM users WHERE id = post.author_id;
-- GOOD SELECT p.*, u.username, u.email FROM posts p JOIN users u ON p.author_id = u.id;
-- Window functions instead of self-joins -- Calculate running total SELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) as running_total FROM orders;
-- Rank within groups SELECT category, product_name, sales, RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category FROM products;
MongoDB Query Optimization:
// Use projection to limit returned fields // BAD db.users.find({ active: true })
// GOOD db.users.find( { active: true }, { username: 1, email: 1, _id: 0 } )
// Use covered queries (index covers all fields) db.users.createIndex({ username: 1, email: 1 }) db.users.find( { username: "john_doe" }, { username: 1, email: 1, _id: 0 } ) // Entire query served from index
// Avoid negation operators // BAD (cannot use index efficiently) db.products.find({ status: { $ne: "discontinued" } })
// GOOD db.products.find({ status: { $in: ["active", "pending", "sold"] } })
// Use $lookup sparingly (expensive operation) // Consider embedding data instead if appropriate
// Aggregation optimization: Filter early // BAD db.orders.aggregate([ { $lookup: { ... } }, // Expensive join { $match: { status: "completed" } } // Filter after join ])
// GOOD db.orders.aggregate([ { $match: { status: "completed" } }, // Filter first { $lookup: { ... } } // Join fewer documents ])
Connection Pooling
PostgreSQL Connection Pooling:
// Using node-postgres (pg) with pool const { Pool } = require('pg')
const pool = new Pool({ host: 'localhost', port: 5432, database: 'mydb', user: 'dbuser', password: 'secret', max: 20, // Maximum pool size idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000 })
// Execute query const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])
// Use PgBouncer for server-side pooling // pgbouncer.ini // [databases] // mydb = host=localhost port=5432 dbname=mydb // // [pgbouncer] // pool_mode = transaction // max_client_conn = 1000 // default_pool_size = 25
MongoDB Connection Pooling:
// MongoClient automatically manages connection pool const { MongoClient } = require('mongodb')
const client = new MongoClient(uri, { maxPoolSize: 50, // Max connections minPoolSize: 10, // Min connections maxIdleTimeMS: 30000, // Close idle connections waitQueueTimeoutMS: 5000 // Wait for available connection })
await client.connect() const db = client.db('mydb') // Connection automatically returned to pool after use
Best Practices
PostgreSQL Best Practices
Schema Design
-
Normalize for data integrity, denormalize for performance
-
Use appropriate data types (avoid TEXT for short strings)
-
Define NOT NULL constraints where appropriate
-
Use SERIAL or UUID for primary keys consistently
Indexing
-
Index foreign keys for JOIN performance
-
Create indexes on frequently filtered/sorted columns
-
Use partial indexes for selective queries
-
Monitor and remove unused indexes
-
Keep composite index column count reasonable (typically ≤ 3-4)
Query Performance
-
Use EXPLAIN ANALYZE to understand query plans
-
Avoid SELECT * in application code
-
Use prepared statements to prevent SQL injection
-
Limit result sets with LIMIT
-
Use connection pooling
Maintenance
-
Run VACUUM regularly (or enable autovacuum)
-
Update statistics with ANALYZE
-
Monitor slow query log
-
Set appropriate autovacuum thresholds
-
Regular backup with pg_dump or WAL archiving
Security
-
Use SSL/TLS for connections
-
Implement row-level security for multi-tenant apps
-
Grant minimum necessary privileges
-
Use parameterized queries
-
Regular security updates
MongoDB Best Practices
Schema Design
-
Embed related data that is accessed together
-
Reference data that is large or rarely accessed
-
Use polymorphic pattern for varied schemas
-
Limit document size to reasonable bounds (< 1-2 MB typically)
-
Design for your query patterns
Indexing
-
Index on fields used in queries and sorts
-
Use compound indexes with ESR rule (Equality, Sort, Range)
-
Create text indexes for full-text search
-
Monitor index usage with $indexStats
-
Avoid too many indexes (write performance impact)
Query Performance
-
Use projection to limit returned fields
-
Create covered queries when possible
-
Filter early in aggregation pipelines
-
Avoid $lookup when embedding is appropriate
-
Use explain() to verify index usage
Scalability
-
Choose appropriate shard key (high cardinality, even distribution)
-
Use replica sets for high availability
-
Configure appropriate read/write concerns
-
Monitor chunk distribution in sharded clusters
-
Use zones for geographic distribution
Operations
-
Enable authentication and authorization
-
Use TLS for client connections
-
Regular backups (mongodump or filesystem snapshots)
-
Monitor with MongoDB Atlas, Ops Manager, or custom tools
-
Keep MongoDB version updated
Data Modeling Decision Framework
Choose PostgreSQL when:
-
Strong ACID guarantees required (financial transactions)
-
Complex relationships with many JOINs
-
Data structure is well-defined and stable
-
Need for advanced SQL features (window functions, CTEs, stored procedures)
-
Compliance requirements demand strict consistency
Choose MongoDB when:
-
Schema flexibility needed (rapid development, evolving requirements)
-
Horizontal scalability is priority (sharding required)
-
Document-oriented data (JSON/BSON native format)
-
Hierarchical or nested data structures
-
High write throughput with eventual consistency acceptable
Hybrid Approach:
-
Use both databases for different parts of application
-
PostgreSQL for transactional data (orders, payments)
-
MongoDB for catalog, logs, user sessions
-
Synchronize critical data between systems
Common Patterns and Anti-Patterns
PostgreSQL Anti-Patterns
❌ Storing JSON when relational fits better
-- BAD: Using JSONB for structured, queryable data CREATE TABLE users ( id SERIAL PRIMARY KEY, data JSONB -- { name, email, address: { street, city, state } } );
-- GOOD: Proper normalization CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255) );
CREATE TABLE addresses ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id), street VARCHAR(255), city VARCHAR(100), state VARCHAR(50) );
❌ Over-indexing
-- BAD: Index on every column "just in case" CREATE INDEX idx1 ON users(username); CREATE INDEX idx2 ON users(email); CREATE INDEX idx3 ON users(created_at); CREATE INDEX idx4 ON users(updated_at); CREATE INDEX idx5 ON users(active); -- Result: Slow writes, large database size
-- GOOD: Index based on actual query patterns CREATE INDEX idx_users_email ON users(email); -- Login queries CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial
❌ N+1 Query Problem
-- BAD: Multiple queries in loop SELECT * FROM posts; -- Returns 100 posts -- Then for each post: SELECT * FROM users WHERE id = ?; -- 100 additional queries!
-- GOOD: Single query with JOIN SELECT p.*, u.username, u.email FROM posts p JOIN users u ON p.author_id = u.id;
MongoDB Anti-Patterns
❌ Massive arrays in documents
// BAD: Unbounded array growth { _id: ObjectId("..."), username: "popular_user", followers: [ ObjectId("follower1"), ObjectId("follower2"), // ... 100,000+ follower IDs // Document exceeds 16MB limit! ] }
// GOOD: Separate collection with references // users collection { _id: ObjectId("..."), username: "popular_user" }
// followers collection { _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") } db.followers.createIndex({ user_id: 1, follower_id: 1 })
❌ Poor shard key selection
// BAD: Monotonically increasing shard key sh.shardCollection("mydb.events", { _id: 1 }) // All writes go to same shard (highest _id range)
// BAD: Low cardinality shard key sh.shardCollection("mydb.users", { country: 1 }) // Most users in few countries = uneven distribution
// GOOD: Hashed _id or compound key sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound
❌ Ignoring indexes on embedded documents
// Document structure { username: "john_doe", profile: { email: "john@example.com", age: 30, city: "San Francisco" } }
// Query on embedded field db.users.find({ "profile.email": "john@example.com" })
// MISSING: Index on embedded field db.users.createIndex({ "profile.email": 1 })
Troubleshooting Guide
PostgreSQL Issues
Slow Queries:
-- Enable slow query logging (postgresql.conf) -- log_min_duration_statement = 1000 # Log queries > 1 second
-- Find slow queries SELECT query, calls, total_exec_time / calls as avg_time_ms, rows / calls as avg_rows FROM pg_stat_statements WHERE calls > 100 ORDER BY total_exec_time DESC LIMIT 20;
-- Analyze specific slow query EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ... FROM ... WHERE ...;
High CPU Usage:
-- Check running queries SELECT pid, now() - query_start as duration, state, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;
-- Terminate long-running query SELECT pg_terminate_backend(pid);
Lock Contention:
-- View locks SELECT locktype, relation::regclass, mode, granted, pid FROM pg_locks WHERE NOT granted;
-- Find blocking queries SELECT blocked_locks.pid AS blocked_pid, blocking_locks.pid AS blocking_pid, blocked_activity.query AS blocked_query, blocking_activity.query AS blocking_query FROM pg_locks blocked_locks JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted AND blocking_locks.granted;
MongoDB Issues
Slow Queries:
// Enable profiling db.setProfilingLevel(1, { slowms: 100 }) // Log queries > 100ms
// View slow queries db.system.profile.find().sort({ ts: -1 }).limit(10)
// Analyze query performance db.collection.find({ ... }).explain("executionStats") // Check: totalDocsExamined vs nReturned (should be close) // Check: executionTimeMillis // Check: indexName (should show index usage)
Replication Lag:
// Check lag on secondary rs.printSecondaryReplicationInfo()
// Check oplog size db.getReplicationInfo()
// Increase oplog size if needed db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB
Sharding Issues:
// Check chunk distribution sh.status()
// Check balancer status sh.getBalancerState() sh.isBalancerRunning()
// Balance specific collection sh.enableBalancing("mydb.mycollection")
// Check for jumbo chunks db.chunks.find({ jumbo: true })
Resources
PostgreSQL Resources
-
Official Documentation: https://www.postgresql.org/docs/
-
PostgreSQL Wiki: https://wiki.postgresql.org/
-
Performance Tuning: https://wiki.postgresql.org/wiki/Performance_Optimization
-
Explain Visualizer: https://explain.dalibo.com/
-
pg_stat_statements Extension: Essential for query analysis
MongoDB Resources
-
Official Documentation: https://docs.mongodb.com/
-
MongoDB University: Free courses and certification
-
Aggregation Framework: https://docs.mongodb.com/manual/aggregation/
-
Sharding Guide: https://docs.mongodb.com/manual/sharding/
-
Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Books
-
PostgreSQL: "PostgreSQL: Up and Running" by Regina Obe & Leo Hsu
-
MongoDB: "MongoDB: The Definitive Guide" by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow
Skill Version: 1.0.0 Last Updated: January 2025 Skill Category: Database Management, Data Architecture, Performance Optimization Technologies: PostgreSQL 16+, MongoDB 7+