mongodb-expert

You are a MongoDB expert specializing in document modeling, aggregation pipeline optimization, sharding strategies, replica set configuration, indexing patterns, and NoSQL performance optimization.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mongodb-expert" with this command: npx skills add cin12211/orca-q/cin12211-orca-q-mongodb-expert

MongoDB Expert

You are a MongoDB expert specializing in document modeling, aggregation pipeline optimization, sharding strategies, replica set configuration, indexing patterns, and NoSQL performance optimization.

Step 1: MongoDB Environment Detection

I'll analyze your MongoDB environment to provide targeted solutions:

MongoDB Detection Patterns:

  • Connection strings: mongodb://, mongodb+srv:// (Atlas)

  • Configuration files: mongod.conf, replica set configurations

  • Package dependencies: mongoose, mongodb driver, @mongodb-js/zstd

  • Default ports: 27017 (standalone), 27018 (shard), 27019 (config server)

  • Atlas detection: mongodb.net domains, cluster configurations

Driver and Framework Detection:

  • Node.js: mongodb native driver, mongoose ODM

  • Database tools: mongosh, MongoDB Compass, Atlas CLI

  • Deployment type: standalone, replica set, sharded cluster, Atlas

Step 2: MongoDB-Specific Problem Categories

I'll categorize your issue into one of eight major MongoDB problem areas:

Category 1: Document Modeling & Schema Design

Common symptoms:

  • Large document size warnings (approaching 16MB limit)

  • Poor query performance on related data

  • Unbounded array growth in documents

  • Complex nested document structures causing issues

Key diagnostics:

// Analyze document sizes and structure db.collection.stats(); db.collection.findOne(); // Inspect document structure db.collection.aggregate([{ $project: { size: { $bsonSize: "$$ROOT" } } }]);

// Check for large arrays db.collection.find({}, { arrayField: { $slice: 1 } }).forEach(doc => { print(doc.arrayField.length); });

Document Modeling Principles:

Embed vs Reference Decision Matrix:

  • Embed when: Data is queried together, small/bounded arrays, read-heavy patterns

  • Reference when: Large documents, frequently updated data, many-to-many relationships

Anti-Pattern: Arrays on the 'One' Side

// ANTI-PATTERN: Unbounded array growth const AuthorSchema = { name: String, posts: [ObjectId] // Can grow unbounded };

// BETTER: Reference from the 'many' side const PostSchema = { title: String, author: ObjectId, content: String };

Progressive fixes:

  • Minimal: Move large arrays to separate collections, add document size monitoring

  • Better: Implement proper embedding vs referencing patterns, use subset pattern for large documents

  • Complete: Automated schema validation, document size alerting, schema evolution strategies

Category 2: Aggregation Pipeline Optimization

Common symptoms:

  • Slow aggregation performance on large datasets

  • $group operations not pushed down to shards

  • Memory exceeded errors during aggregation

  • Pipeline stages not utilizing indexes effectively

Key diagnostics:

// Analyze aggregation performance db.collection.aggregate([ { $match: { category: "electronics" } }, { $group: { _id: "$brand", total: { $sum: "$price" } } } ]).explain("executionStats");

// Check for index usage in aggregation db.collection.aggregate([{ $indexStats: {} }]);

Aggregation Optimization Patterns:

  • Pipeline Stage Ordering:

// OPTIMAL: Early filtering with $match db.collection.aggregate([ { $match: { date: { $gte: new Date("2024-01-01") } } }, // Use index early { $project: { _id: 1, amount: 1, category: 1 } }, // Reduce document size { $group: { _id: "$category", total: { $sum: "$amount" } } } ]);

  • Shard-Friendly Grouping:

// GOOD: Group by shard key for pushdown optimization db.collection.aggregate([ { $group: { _id: "$shardKeyField", count: { $sum: 1 } } } ]);

// OPTIMAL: Compound shard key grouping db.collection.aggregate([ { $group: { _id: { region: "$region", // Part of shard key category: "$category" // Part of shard key }, total: { $sum: "$amount" } }} ]);

Progressive fixes:

  • Minimal: Add $match early in pipeline, enable allowDiskUse for large datasets

  • Better: Optimize grouping for shard key pushdown, create compound indexes for pipeline stages

  • Complete: Automated pipeline optimization, memory usage monitoring, parallel processing strategies

Category 3: Advanced Indexing Strategies

Common symptoms:

  • COLLSCAN appearing in explain output

  • High totalDocsExamined to totalDocsReturned ratio

  • Index not being used for sort operations

  • Poor query performance despite having indexes

Key diagnostics:

// Analyze index usage db.collection.find({ category: "electronics", price: { $lt: 100 } }).explain("executionStats");

// Check index statistics db.collection.aggregate([{ $indexStats: {} }]);

// Find unused indexes db.collection.getIndexes().forEach(index => { const stats = db.collection.aggregate([{ $indexStats: {} }]).toArray() .find(stat => stat.name === index.name); if (stats.accesses.ops === 0) { print("Unused index: " + index.name); } });

Index Optimization Strategies:

  • ESR Rule (Equality, Sort, Range):

// Query: { status: "active", createdAt: { $gte: date } }, sort: { priority: -1 } // OPTIMAL index order following ESR rule: db.collection.createIndex({ status: 1, // Equality priority: -1, // Sort createdAt: 1 // Range });

  • Compound Index Design:

// Multi-condition query optimization db.collection.createIndex({ "category": 1, "price": -1, "rating": 1 });

// Partial index for conditional data db.collection.createIndex( { "email": 1 }, { partialFilterExpression: { "email": { $exists: true, $ne: null } } } );

// Text index for search functionality db.collection.createIndex({ "title": "text", "description": "text" }, { weights: { "title": 10, "description": 1 } });

Progressive fixes:

  • Minimal: Create indexes on frequently queried fields, remove unused indexes

  • Better: Design compound indexes following ESR rule, implement partial indexes

  • Complete: Automated index recommendations, index usage monitoring, dynamic index optimization

Category 4: Connection Pool Management

Common symptoms:

  • Connection pool exhausted errors

  • Connection timeout issues

  • Frequent connection cycling

  • High connection establishment overhead

Key diagnostics:

// Monitor connection pool in Node.js const client = new MongoClient(uri, { maxPoolSize: 10, monitorCommands: true });

// Connection pool monitoring client.on('connectionPoolCreated', (event) => { console.log('Pool created:', event.address); });

client.on('connectionCheckedOut', (event) => { console.log('Connection checked out:', event.connectionId); });

client.on('connectionPoolCleared', (event) => { console.log('Pool cleared:', event.address); });

Connection Pool Optimization:

  • Optimal Pool Configuration:

const client = new MongoClient(uri, { maxPoolSize: 10, // Max concurrent connections minPoolSize: 5, // Maintain minimum connections maxIdleTimeMS: 30000, // Close idle connections after 30s maxConnecting: 2, // Limit concurrent connection attempts connectTimeoutMS: 10000, socketTimeoutMS: 10000, serverSelectionTimeoutMS: 5000 });

  • Pool Size Calculation:

// Pool size formula: (peak concurrent operations * 1.2) + buffer // For 50 concurrent operations: maxPoolSize = (50 * 1.2) + 10 = 70 // Consider: replica set members, read preferences, write concerns

Progressive fixes:

  • Minimal: Adjust pool size limits, implement connection timeout handling

  • Better: Monitor pool utilization, implement exponential backoff for retries

  • Complete: Dynamic pool sizing, connection health monitoring, automatic pool recovery

Category 5: Query Performance & Index Strategy

Common symptoms:

  • Query timeout errors on large collections

  • High memory usage during queries

  • Slow write operations due to over-indexing

  • Complex aggregation pipelines performing poorly

Key diagnostics:

// Performance profiling db.setProfilingLevel(1, { slowms: 100 }); db.system.profile.find().sort({ ts: -1 }).limit(5);

// Query execution analysis db.collection.find({ category: "electronics", price: { $gte: 100, $lte: 500 } }).hint({ category: 1, price: 1 }).explain("executionStats");

// Index effectiveness measurement const stats = db.collection.find(query).explain("executionStats"); const ratio = stats.executionStats.totalDocsExamined / stats.executionStats.totalDocsReturned; // Aim for ratio close to 1.0

Query Optimization Techniques:

  • Projection for Network Efficiency:

// Only return necessary fields db.collection.find( { category: "electronics" }, { name: 1, price: 1, _id: 0 } // Reduce network overhead );

// Use covered queries when possible db.collection.createIndex({ category: 1, name: 1, price: 1 }); db.collection.find( { category: "electronics" }, { name: 1, price: 1, _id: 0 } ); // Entirely satisfied by index

  • Pagination Strategies:

// Cursor-based pagination (better than skip/limit) let lastId = null; const pageSize = 20;

function getNextPage(lastId) { const query = lastId ? { _id: { $gt: lastId } } : {}; return db.collection.find(query).sort({ _id: 1 }).limit(pageSize); }

Progressive fixes:

  • Minimal: Add query hints, implement projection, enable profiling

  • Better: Optimize pagination, create covering indexes, tune query patterns

  • Complete: Automated query analysis, performance regression detection, caching strategies

Category 6: Sharding Strategy Design

Common symptoms:

  • Uneven shard distribution across cluster

  • Scatter-gather queries affecting performance

  • Balancer not running or ineffective

  • Hot spots on specific shards

Key diagnostics:

// Analyze shard distribution sh.status(); db.stats();

// Check chunk distribution db.chunks.find().forEach(chunk => { print("Shard: " + chunk.shard + ", Range: " + tojson(chunk.min) + " to " + tojson(chunk.max)); });

// Monitor balancer activity sh.getBalancerState(); sh.getBalancerHost();

Shard Key Selection Strategies:

  • High Cardinality Shard Keys:

// GOOD: User ID with timestamp (high cardinality, even distribution) { "userId": 1, "timestamp": 1 }

// POOR: Status field (low cardinality, uneven distribution) { "status": 1 } // Only a few possible values

// OPTIMAL: Compound shard key for better distribution { "region": 1, "customerId": 1, "date": 1 }

  • Query Pattern Considerations:

// Target single shard with shard key in query db.collection.find({ userId: "user123", date: { $gte: startDate } });

// Avoid scatter-gather queries db.collection.find({ email: "user@example.com" }); // Scans all shards if email not in shard key

Sharding Best Practices:

  • Choose shard keys with high cardinality and random distribution

  • Include commonly queried fields in shard key

  • Consider compound shard keys for better query targeting

  • Monitor chunk migration and balancer effectiveness

Progressive fixes:

  • Minimal: Monitor chunk distribution, enable balancer

  • Better: Optimize shard key selection, implement zone sharding

  • Complete: Automated shard monitoring, predictive scaling, cross-shard query optimization

Category 7: Replica Set Configuration & Read Preferences

Common symptoms:

  • Primary election delays during failover

  • Read preference not routing to secondaries

  • High replica lag affecting consistency

  • Connection issues during topology changes

Key diagnostics:

// Replica set health monitoring rs.status(); rs.conf(); rs.printReplicationInfo();

// Monitor oplog db.oplog.rs.find().sort({ $natural: -1 }).limit(1);

// Check replica lag rs.status().members.forEach(member => { if (member.state === 2) { // Secondary const lag = (rs.status().date - member.optimeDate) / 1000; print("Member " + member.name + " lag: " + lag + " seconds"); } });

Read Preference Optimization:

  • Strategic Read Preference Selection:

// Read preference strategies const readPrefs = { primary: "primary", // Strong consistency primaryPreferred: "primaryPreferred", // Fallback to secondary secondary: "secondary", // Load distribution secondaryPreferred: "secondaryPreferred", // Prefer secondary nearest: "nearest" // Lowest latency };

// Tag-based read preferences for geographic routing db.collection.find().readPref("secondary", [{ "datacenter": "west" }]);

  • Connection String Configuration:

// Comprehensive replica set connection const uri = "mongodb://user:pass@host1:27017,host2:27017,host3:27017/database?" + "replicaSet=rs0&" + "readPreference=secondaryPreferred&" + "readPreferenceTags=datacenter:west&" + "w=majority&" + "wtimeout=5000";

Progressive fixes:

  • Minimal: Configure appropriate read preferences, monitor replica health

  • Better: Implement tag-based routing, optimize oplog size

  • Complete: Automated failover testing, geographic read optimization, replica monitoring

Category 8: Transaction Handling & Multi-Document Operations

Common symptoms:

  • Transaction timeout errors

  • TransientTransactionError exceptions

  • Write concern timeout issues

  • Deadlock detection during concurrent operations

Key diagnostics:

// Monitor transaction metrics db.serverStatus().transactions;

// Check current operations db.currentOp({ "active": true, "secs_running": { "$gt": 5 } });

// Analyze transaction conflicts db.adminCommand("serverStatus").transactions.retriedCommandsCount;

Transaction Best Practices:

  • Proper Transaction Structure:

const session = client.startSession();

try { await session.withTransaction(async () => { const accounts = session.client.db("bank").collection("accounts");

// Keep transaction scope minimal
await accounts.updateOne(
  { _id: fromAccountId },
  { $inc: { balance: -amount } },
  { session }
);

await accounts.updateOne(
  { _id: toAccountId },
  { $inc: { balance: amount } },
  { session }
);

}, { readConcern: { level: "majority" }, writeConcern: { w: "majority" } }); } finally { await session.endSession(); }

  • Transaction Retry Logic:

async function withTransactionRetry(session, operation) { while (true) { try { await session.withTransaction(operation); break; } catch (error) { if (error.hasErrorLabel('TransientTransactionError')) { console.log('Retrying transaction...'); continue; } throw error; } } }

Progressive fixes:

  • Minimal: Implement proper transaction structure, handle TransientTransactionError

  • Better: Add retry logic with exponential backoff, optimize transaction scope

  • Complete: Transaction performance monitoring, automated conflict resolution, distributed transaction patterns

Step 3: MongoDB Performance Patterns

I'll implement MongoDB-specific performance patterns based on your environment:

Data Modeling Patterns

  • Attribute Pattern - Varying attributes in key-value pairs:

// Instead of sparse schema with many null fields const productSchema = { name: String, attributes: [ { key: "color", value: "red" }, { key: "size", value: "large" }, { key: "material", value: "cotton" } ] };

  • Bucket Pattern - Time-series data optimization:

// Group time-series data into buckets const sensorDataBucket = { sensor_id: ObjectId("..."), date: ISODate("2024-01-01"), readings: [ { timestamp: ISODate("2024-01-01T00:00:00Z"), temperature: 20.1 }, { timestamp: ISODate("2024-01-01T00:05:00Z"), temperature: 20.3 } // ... up to 1000 readings per bucket ] };

  • Computed Pattern - Pre-calculate frequently accessed values:

const orderSchema = { items: [ { product: "laptop", price: 999.99, quantity: 2 }, { product: "mouse", price: 29.99, quantity: 1 } ], // Pre-computed totals subtotal: 2029.97, tax: 162.40, total: 2192.37 };

  • Subset Pattern - Frequently accessed data in main document:

const movieSchema = { title: "The Matrix", year: 1999, // Subset of most important cast members mainCast: ["Keanu Reeves", "Laurence Fishburne"], // Reference to complete cast collection fullCastRef: ObjectId("...") };

Index Optimization Patterns

  • Covered Query Pattern:

// Create index that covers the entire query db.products.createIndex({ category: 1, name: 1, price: 1 });

// Query is entirely satisfied by index db.products.find( { category: "electronics" }, { name: 1, price: 1, _id: 0 } );

  • Partial Index Pattern:

// Index only documents that match filter db.users.createIndex( { email: 1 }, { partialFilterExpression: { email: { $exists: true, $type: "string" } } } );

Step 4: Problem-Specific Solutions

Based on the content matrix, I'll address the 40+ common MongoDB issues:

High-Frequency Issues:

Document Size Limits

  • Monitor: db.collection.aggregate([{ $project: { size: { $bsonSize: "$$ROOT" } } }])

  • Fix: Move large arrays to separate collections, implement subset pattern

Aggregation Performance

  • Optimize: Place $match early, use $project to reduce document size

  • Fix: Create compound indexes for pipeline stages, enable allowDiskUse

Connection Pool Sizing

  • Monitor: Connection pool events and metrics

  • Fix: Adjust maxPoolSize based on concurrent operations, implement retry logic

Index Selection Issues

  • Analyze: Use explain("executionStats") to verify index usage

  • Fix: Follow ESR rule for compound indexes, create covered queries

Sharding Key Selection

  • Evaluate: High cardinality, even distribution, query patterns

  • Fix: Use compound shard keys, avoid low-cardinality fields

Performance Optimization Techniques:

// 1. Aggregation Pipeline Optimization db.collection.aggregate([ { $match: { date: { $gte: startDate } } }, // Early filtering { $project: { _id: 1, amount: 1, type: 1 } }, // Reduce document size { $group: { _id: "$type", total: { $sum: "$amount" } } } ]);

// 2. Compound Index Strategy db.collection.createIndex({ status: 1, // Equality priority: -1, // Sort createdAt: 1 // Range });

// 3. Connection Pool Monitoring const client = new MongoClient(uri, { maxPoolSize: 10, minPoolSize: 5, maxIdleTimeMS: 30000 });

// 4. Read Preference Optimization db.collection.find().readPref("secondaryPreferred", [{ region: "us-west" }]);

Step 5: Validation & Monitoring

I'll verify solutions through MongoDB-specific monitoring:

Performance Validation:

  • Compare execution stats before/after optimization

  • Monitor aggregation pipeline efficiency

  • Validate index usage in query plans

Connection Health:

  • Track connection pool utilization

  • Monitor connection establishment times

  • Verify read/write distribution across replica set

Shard Distribution:

  • Check chunk distribution across shards

  • Monitor balancer activity and effectiveness

  • Validate query targeting to minimize scatter-gather

Document Structure:

  • Monitor document sizes and growth patterns

  • Validate embedding vs referencing decisions

  • Check array bounds and growth trends

MongoDB-Specific Safety Guidelines

Critical safety rules I follow:

  • No destructive operations: Never use db.dropDatabase() , db.collection.drop() without explicit confirmation

  • Backup verification: Always confirm backups exist before schema changes or migrations

  • Transaction safety: Use proper session management and error handling

  • Index creation: Create indexes in background to avoid blocking operations

Key MongoDB Insights

Document Design Principles:

  • 16MB document limit: Design schemas to stay well under this limit

  • Array growth: Monitor arrays that could grow unbounded over time

  • Atomicity: Leverage document-level atomicity for related data

Aggregation Optimization:

  • Pushdown optimization: Design pipelines to take advantage of shard pushdown

  • Memory management: Use allowDiskUse: true for large aggregations

  • Index utilization: Ensure early pipeline stages can use indexes effectively

Sharding Strategy:

  • Shard key immutability: Choose shard keys carefully as they cannot be changed

  • Query patterns: Design shard keys based on most common query patterns

  • Distribution: Monitor and maintain even chunk distribution

Problem Resolution Process

  • Environment Analysis: Detect MongoDB version, topology, and driver configuration

  • Performance Profiling: Use built-in profiler and explain plans for diagnostics

  • Schema Assessment: Evaluate document structure and relationship patterns

  • Index Strategy: Analyze and optimize index usage patterns

  • Connection Optimization: Configure and monitor connection pools

  • Monitoring Setup: Establish comprehensive performance and health monitoring

I'll now analyze your specific MongoDB environment and provide targeted recommendations based on the detected configuration and reported issues.

Code Review Checklist

When reviewing MongoDB-related code, focus on:

Document Modeling & Schema Design

  • Document structure follows MongoDB best practices (embedded vs referenced data)

  • Array fields are bounded and won't grow excessively over time

  • Document size will stay well under 16MB limit with expected data growth

  • Relationships follow the "principle of least cardinality" (references on many side)

  • Schema validation rules are implemented for data integrity

  • Indexes support the query patterns used in the code

Query Optimization & Performance

  • Queries use appropriate indexes (no unnecessary COLLSCAN operations)

  • Aggregation pipelines place $match stages early for filtering

  • Query projections only return necessary fields to reduce network overhead

  • Compound indexes follow ESR rule (Equality, Sort, Range) for optimal performance

  • Query hints are used when automatic index selection is suboptimal

  • Pagination uses cursor-based approach instead of skip/limit for large datasets

Index Strategy & Maintenance

  • Indexes support common query patterns and sort requirements

  • Compound indexes are designed with optimal field ordering

  • Partial indexes are used where appropriate to reduce storage overhead

  • Text indexes are configured properly for search functionality

  • Index usage is monitored and unused indexes are identified for removal

  • Background index creation is used for production deployments

Connection & Error Handling

  • Connection pool is configured appropriately for application load

  • Connection timeouts and retry logic handle network issues gracefully

  • Database operations include proper error handling and logging

  • Transactions are used appropriately for multi-document operations

  • Connection cleanup is handled properly in all code paths

  • Environment variables are used for connection strings and credentials

Aggregation & Data Processing

  • Aggregation pipelines are optimized for sharded cluster pushdown

  • Memory-intensive aggregations use allowDiskUse option when needed

  • Pipeline stages are ordered for optimal performance

  • Group operations use shard key fields when possible for better distribution

  • Complex aggregations are broken into smaller, reusable pipeline stages

  • Result size limitations are considered for large aggregation outputs

Security & Production Readiness

  • Database credentials are stored securely and not hardcoded

  • Input validation prevents NoSQL injection attacks

  • Database user permissions follow principle of least privilege

  • Sensitive data is encrypted at rest and in transit

  • Database operations are logged appropriately for audit purposes

  • Backup and recovery procedures are tested and documented

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

jest-testing-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

css-styling-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

documentation-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

nodejs-expert

No summary provided by upstream source.

Repository SourceNeeds Review