database-sharding
Comprehensive database sharding patterns for horizontal scaling with hash, range, and directory-based strategies.
Quick Start (10 Minutes)
Step 1: Choose sharding strategy from templates:
Hash-based (even distribution)
cat templates/hash-router.ts
Range-based (time-series data)
cat templates/range-router.ts
Directory-based (multi-tenancy)
cat templates/directory-router.ts
Step 2: Select shard key criteria:
-
✅ High cardinality (millions of unique values)
-
✅ Even distribution (no single value > 5%)
-
✅ Immutable (never changes)
-
✅ Query alignment (in 80%+ of WHERE clauses)
Step 3: Implement router:
import { HashRouter } from './hash-router';
const router = new HashRouter([ { id: 'shard_0', connection: { host: 'db0.example.com' } }, { id: 'shard_1', connection: { host: 'db1.example.com' } }, { id: 'shard_2', connection: { host: 'db2.example.com' } }, { id: 'shard_3', connection: { host: 'db3.example.com' } }, ]);
// Query single shard const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Critical Rules
✓ Always Do
Rule Reason
Include shard key in queries Avoid scanning all shards (100x slower)
Monitor shard distribution Detect hotspots before they cause outages
Plan for rebalancing upfront Cannot easily add shards later
Choose immutable shard key Changing key = data migration nightmare
Test distribution with production data Synthetic data hides real hotspots
Denormalize for data locality Keep related data on same shard
✗ Never Do
Anti-Pattern Why It's Bad
Sequential ID with range sharding Latest shard gets all writes (hotspot)
Timestamp as shard key Recent shard overwhelmed
Cross-shard transactions without 2PC Data corruption, inconsistency
Simple modulo without consistent hashing Cannot add shards without full re-shard
Nullable shard key Special NULL handling creates hotspots
No shard routing layer Hardcoded shards = cannot rebalance
Top 7 Critical Errors
Error 1: Wrong Shard Key Choice (Hotspots)
Symptom: One shard receives 80%+ of traffic Fix:
// ❌ Bad: Low cardinality (status field) shard_key = order.status; // 90% are 'pending' → shard_0 overloaded
// ✅ Good: High cardinality (user_id) shard_key = order.user_id; // Millions of users, even distribution
Error 2: Missing Shard Key in Queries
Symptom: Queries scan ALL shards (extremely slow) Fix:
// ❌ Bad: No shard key SELECT * FROM orders WHERE status = 'shipped'; // Scans all 100 shards!
// ✅ Good: Include shard key SELECT * FROM orders WHERE user_id = ? AND status = 'shipped'; // Targets 1 shard
Error 3: Sequential IDs with Range Sharding
Symptom: Latest shard gets all writes Fix:
// ❌ Bad: Range sharding with auto-increment // Shard 0: 1-1M, Shard 1: 1M-2M, Shard 2: 2M+ → All new writes to Shard 2!
// ✅ Good: Hash-based sharding const shardId = hash(id) % shardCount; // Even distribution
Error 4: No Rebalancing Strategy
Symptom: Stuck with initial shard count, cannot scale Fix:
// ❌ Bad: Simple modulo const shardId = hash(key) % shardCount; // Adding 5th shard breaks ALL keys
// ✅ Good: Consistent hashing const ring = new ConsistentHashRing(shards); const shardId = ring.getNode(key); // Only ~25% of keys move when adding shard
Error 5: Cross-Shard Transactions
Symptom: Data inconsistency, partial writes Fix:
// ❌ Bad: Cross-shard transaction (will corrupt) BEGIN; UPDATE shard_1.accounts SET balance = balance - 100 WHERE id = 'A'; UPDATE shard_2.accounts SET balance = balance + 100 WHERE id = 'B'; COMMIT; // If shard_2 fails, shard_1 already committed!
// ✅ Good: Two-Phase Commit or Saga pattern const txn = new TwoPhaseCommitTransaction(); txn.addOperation(shard_1, 'UPDATE accounts SET balance = balance - 100 WHERE id = ?', ['A']); txn.addOperation(shard_2, 'UPDATE accounts SET balance = balance + 100 WHERE id = ?', ['B']); await txn.execute(); // Atomic across shards
Error 6: Mutable Shard Key
Symptom: Records move shards, causing duplicates Fix:
// ❌ Bad: Shard by country (user relocates) shard_key = user.country; // User moves US → CA, now in different shard!
// ✅ Good: Shard by immutable user_id shard_key = user.id; // Never changes
Error 7: No Monitoring
Symptom: Silent hotspots, sudden performance degradation Fix:
// ✅ Required metrics
- Per-shard record counts (should be within 20%)
- Query distribution (no shard > 40% of queries)
- Storage per shard (alert at 80%)
- Latency p99 per shard
Load references/error-catalog.md for all 10 errors with detailed fixes.
Sharding Strategies
Strategy Best For Pros Cons
Hash User data, even load critical No hotspots, predictable Range queries scatter
Range Time-series, logs, append-only Range queries efficient, archival Recent shard hotspot
Directory Multi-tenancy, complex routing Flexible, easy rebalancing Lookup overhead, SPOF
Load references/sharding-strategies.md for detailed comparisons with production examples (Instagram, Discord, Salesforce).
Shard Key Selection Criteria
Criterion Importance Check Method
High cardinality Critical COUNT(DISTINCT shard_key)
shard_count × 100
Even distribution Critical No value > 5% of total
Immutable Critical Value never changes
Query alignment High 80%+ queries include it
Data locality Medium Related records together
Decision Tree:
-
User-focused app → user_id
-
Multi-tenant SaaS → tenant_id
-
Time-series/logs → timestamp (range sharding)
-
Product catalog → product_id
Load references/shard-key-selection.md for comprehensive decision trees and testing strategies.
Configuration Summary
Hash-Based Router
import { HashRouter } from './templates/hash-router';
const router = new HashRouter([ { id: 'shard_0', connection: { /* PostgreSQL config / } }, { id: 'shard_1', connection: { / PostgreSQL config */ } }, ]);
// Automatically routes to correct shard const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Range-Based Router
import { RangeRouter } from './templates/range-router';
const router = new RangeRouter(shardConfigs, [ { start: Date.parse('2024-01-01'), end: Date.parse('2024-04-01'), shardId: 'shard_q1' }, { start: Date.parse('2024-04-01'), end: Date.parse('2024-07-01'), shardId: 'shard_q2' }, { start: Date.parse('2024-07-01'), end: Infinity, shardId: 'shard_q3' }, ]);
// Range queries target specific shards const janEvents = await router.queryRange( Date.parse('2024-01-01'), Date.parse('2024-02-01'), 'SELECT * FROM events WHERE created_at BETWEEN $1 AND $2' );
Directory-Based Router
import { DirectoryRouter } from './templates/directory-router';
const router = new DirectoryRouter(directoryDBConfig, shardConfigs);
// Assign tenant to specific shard await router.assignShard('tenant_acme', 'shard_enterprise');
// Route automatically const users = await router.query('tenant_acme', 'SELECT * FROM users');
When to Load References
Choosing Strategy
Load references/sharding-strategies.md when:
-
Deciding between hash, range, directory
-
Need production examples (Instagram, Discord)
-
Planning hybrid approaches
Selecting Shard Key
Load references/shard-key-selection.md when:
-
Choosing shard key for new project
-
Evaluating existing shard key
-
Testing distribution with production data
Implementation
Load references/implementation-patterns.md when:
-
Building shard router from scratch
-
Implementing consistent hashing
-
Need transaction handling (2PC, Saga)
-
Setting up monitoring/metrics
Cross-Shard Operations
Load references/cross-shard-queries.md when:
-
Need to aggregate across shards (COUNT, SUM, AVG)
-
Implementing cross-shard joins
-
Building pagination across shards
-
Optimizing scatter-gather patterns
Rebalancing
Load references/rebalancing-guide.md when:
-
Adding new shards
-
Migrating data between shards
-
Planning zero-downtime migrations
-
Balancing uneven load
Error Prevention
Load references/error-catalog.md when:
-
Troubleshooting performance issues
-
Reviewing shard architecture
-
All 10 documented errors with fixes
Complete Setup Checklist
Before Sharding:
-
Tested shard key distribution with production data
-
Shard key in 80%+ of queries
-
Monitoring infrastructure ready
-
Rebalancing strategy planned
Router Implementation:
-
Shard routing layer (not hardcoded shards)
-
Connection pooling per shard
-
Error handling and retries
-
Metrics collection (queries/shard, latency)
Shard Configuration:
-
4-8 shards initially (room to grow)
-
Consistent hashing or virtual shards
-
Replicas per shard (HA)
-
Backup strategy per shard
Application Changes:
-
All queries include shard key
-
Cross-shard joins eliminated (denormalized)
-
Transaction boundaries respected
-
Connection pooling configured
Production Example
Before (Single database overwhelmed):
// Single PostgreSQL instance const db = new Pool({ host: 'db.example.com' });
// All 10M users on one server const users = await db.query('SELECT * FROM users WHERE status = $1', ['active']); // Query time: 5000ms (slow!) // DB CPU: 95% // Disk: 500GB, growing
After (Sharded across 8 servers):
// Hash-based sharding with 8 shards const router = new HashRouter([ { id: 'shard_0', connection: { host: 'db0.example.com' } }, { id: 'shard_1', connection: { host: 'db1.example.com' } }, // ... 6 more shards ]);
// Query single user (targets 1 shard) const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']); // Query time: 10ms (500x faster!)
// Query all shards (scatter-gather) const allActive = await router.queryAll('SELECT * FROM users WHERE status = $1', ['active']); // Query time: 800ms (parallelized across 8 shards, 6x faster than single)
// Result: Each shard handles ~1.25M users // DB CPU per shard: 20% // Disk per shard: 65GB // Can scale to 16 shards easily (consistent hashing)
Known Issues Prevention
All 10 documented errors prevented:
-
✅ Wrong shard key (hotspots) → Test distribution first
-
✅ Missing shard key in queries → Code review, linting
-
✅ Cross-shard transactions → Use 2PC or Saga pattern
-
✅ Sequential ID hotspots → Use hash-based sharding
-
✅ No rebalancing strategy → Consistent hashing from day 1
-
✅ Timestamp sharding hotspots → Hybrid hash+range approach
-
✅ Mutable shard key → Choose immutable keys (user_id)
-
✅ No routing layer → Abstract with router from start
-
✅ No monitoring → Track per-shard metrics
-
✅ Weak hash function → Use MD5, MurmurHash3, xxHash
See: references/error-catalog.md for detailed fixes
Resources
Templates:
-
templates/hash-router.ts
-
Hash-based sharding
-
templates/range-router.ts
-
Range-based sharding
-
templates/directory-router.ts
-
Directory-based sharding
-
templates/cross-shard-aggregation.ts
-
Aggregation patterns
References:
-
references/sharding-strategies.md
-
Strategy comparison
-
references/shard-key-selection.md
-
Key selection guide
-
references/implementation-patterns.md
-
Router implementations
-
references/cross-shard-queries.md
-
Query patterns
-
references/rebalancing-guide.md
-
Migration strategies
-
references/error-catalog.md
-
All 10 errors documented
Production Examples:
-
Instagram: Range sharding for media
-
Discord: Hash sharding for messages
-
Salesforce: Directory sharding for orgs
Production-tested | 10 errors prevented | MIT License