Database Query Optimizer

Analyzes database queries, interprets EXPLAIN plans, suggests indexes, and detects common performance issues like N+1 queries.

When to Use

"Optimize my database query"
"Analyze EXPLAIN plan"
"Why is my query slow?"
"Suggest indexes"
"Fix N+1 queries"
"Improve database performance"

Instructions

PostgreSQL Query Analysis

Run EXPLAIN:

EXPLAIN ANALYZE SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.created_at > '2024-01-01' GROUP BY u.id, u.name ORDER BY post_count DESC LIMIT 10;

Interpret EXPLAIN output:

QUERY PLAN

Limit (cost=1234.56..1234.58 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1) -> Sort (cost=1234.56..1345.67 rows=44444 width=40) (actual time=45.122..45.123 rows=10 loops=1) Sort Key: (count(p.id)) DESC Sort Method: top-N heapsort Memory: 25kB -> HashAggregate (cost=1000.00..1200.00 rows=44444 width=40) (actual time=40.456..42.789 rows=45000 loops=1) Group Key: u.id -> Hash Left Join (cost=100.00..900.00 rows=50000 width=32) (actual time=1.234..35.678 rows=100000 loops=1) Hash Cond: (p.user_id = u.id) -> Seq Scan on posts p (cost=0.00..500.00 rows=50000 width=4) (actual time=0.010..10.234 rows=50000 loops=1) -> Hash (cost=75.00..75.00 rows=2000 width=32) (actual time=1.200..1.200 rows=2000 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 125kB -> Seq Scan on users u (cost=0.00..75.00 rows=2000 width=32) (actual time=0.005..0.678 rows=2000 loops=1) Filter: (created_at > '2024-01-01'::date) Rows Removed by Filter: 500 Planning Time: 0.234 ms Execution Time: 45.234 ms

Key metrics to analyze:

cost: Estimated cost (first number = startup, second = total)
rows: Estimated rows returned
width: Average row size in bytes
actual time: Real execution time (ms)
loops: Number of times node executed

Red flags:

Sequential Scan on large tables
High cost values
Rows estimate far from actual
Multiple loops
Slow execution time

Optimization Strategies

Add Index:

-- Create index on filtered column CREATE INDEX idx_users_created_at ON users(created_at);

-- Create index on join column CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Composite index for specific query pattern CREATE INDEX idx_users_created_name ON users(created_at, name);

-- Partial index for common filter CREATE INDEX idx_users_recent ON users(created_at) WHERE created_at > '2024-01-01';

-- Covering index (includes all needed columns) CREATE INDEX idx_users_covering ON users(id, name, created_at);

Rewrite Query:

-- ❌ BAD: Subquery in SELECT SELECT u.name, (SELECT COUNT(*) FROM posts WHERE user_id = u.id) as post_count FROM users u;

-- ✅ GOOD: Use JOIN SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id GROUP BY u.id, u.name;

-- ❌ BAD: OR conditions SELECT * FROM users WHERE email = 'test@example.com' OR username = 'test';

-- ✅ GOOD: Use UNION (can use separate indexes) SELECT * FROM users WHERE email = 'test@example.com' UNION SELECT * FROM users WHERE username = 'test';

-- ❌ BAD: Function on indexed column SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- ✅ GOOD: Functional index or avoid function CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- Or just: SELECT * FROM users WHERE email = 'test@example.com';

N+1 Query Detection

Problem:

Python/SQLAlchemy example

❌ N+1 Query Problem

users = User.query.all() # 1 query for user in users: posts = user.posts # N queries (one per user) print(f"{user.name}: {len(posts)} posts")

Total: 1 + N queries

Solution:

✅ Eager Loading

users = User.query.options(joinedload(User.posts)).all() # 1 query for user in users: posts = user.posts # No additional query print(f"{user.name}: {len(posts)} posts")

Total: 1 query

Node.js/Sequelize:

// ❌ N+1 Problem const users = await User.findAll(); for (const user of users) { const posts = await user.getPosts(); // N queries }

// ✅ Solution: Include associations const users = await User.findAll({ include: [{ model: Post }] // 1 query with JOIN });

Rails/ActiveRecord:

❌ N+1 Problem

users = User.all users.each do |user| puts user.posts.count # N queries end

✅ Solution: includes

users = User.includes(:posts) users.each do |user| puts user.posts.count # No additional queries end

Index Suggestions

Automated analysis:

-- PostgreSQL: Find missing indexes SELECT schemaname, tablename, attname, n_distinct, correlation FROM pg_stats WHERE schemaname = 'public' AND n_distinct > 100 AND correlation < 0.5 ORDER BY n_distinct DESC;

-- Find tables with sequential scans SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 0 AND seq_tup_read / seq_scan > 10000 ORDER BY seq_tup_read DESC;

-- Unused indexes SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_toast%' ORDER BY pg_relation_size(indexrelid) DESC;

MySQL:

-- Missing indexes SELECT * FROM sys.schema_unused_indexes;

-- Duplicate indexes SELECT * FROM sys.schema_redundant_indexes;

-- Table scan queries SELECT * FROM sys.statements_with_full_table_scans LIMIT 10;

Query Optimization Checklist

Python Script:

#!/usr/bin/env python3 import psycopg2 import re

class QueryOptimizer: def init(self, conn): self.conn = conn

def analyze_query(self, query):
    """Analyze query and provide optimization suggestions."""
    suggestions = []

    # Check for SELECT *
    if re.search(r'SELECT\s+\*', query, re.IGNORECASE):
        suggestions.append("❌ Avoid SELECT *. Specify only needed columns.")

    # Check for missing WHERE clause
    if re.search(r'FROM\s+\w+', query, re.IGNORECASE) and \
       not re.search(r'WHERE', query, re.IGNORECASE):
        suggestions.append("⚠️  No WHERE clause. Consider adding filters.")

    # Check for OR in WHERE
    if re.search(r'WHERE.*\sOR\s', query, re.IGNORECASE):
        suggestions.append("⚠️  OR conditions may prevent index usage. Consider UNION.")

    # Check for functions on indexed columns
    if re.search(r'WHERE\s+\w+\([^\)]+\)\s*=', query, re.IGNORECASE):
        suggestions.append("❌ Functions on columns prevent index usage.")

    # Check for LIKE with leading wildcard
    if re.search(r'LIKE\s+[\'"]%', query, re.IGNORECASE):
        suggestions.append("❌ LIKE with leading % cannot use index.")

    # Run EXPLAIN
    cursor = self.conn.cursor()
    try:
        cursor.execute(f"EXPLAIN ANALYZE {query}")
        plan = cursor.fetchall()

        # Check for sequential scans
        plan_str = str(plan)
        if 'Seq Scan' in plan_str:
            suggestions.append("❌ Sequential scan detected. Consider adding index.")

        # Check for high cost
        cost_match = re.search(r'cost=(\d+\.\d+)', plan_str)
        if cost_match:
            cost = float(cost_match.group(1))
            if cost > 10000:
                suggestions.append(f"⚠️  High query cost: {cost:.2f}")

        return {
            'suggestions': suggestions,
            'explain_plan': plan
        }
    finally:
        cursor.close()

def suggest_indexes(self, query):
    """Suggest indexes based on query pattern."""
    indexes = []

    # Find WHERE conditions
    where_matches = re.findall(r'WHERE\s+(\w+)\s*[=&#x3C;>]', query, re.IGNORECASE)
    for col in where_matches:
        indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

    # Find JOIN conditions
    join_matches = re.findall(r'ON\s+\w+\.(\w+)\s*=\s*\w+\.(\w+)', query, re.IGNORECASE)
    for col1, col2 in join_matches:
        indexes.append(f"CREATE INDEX idx_{col1} ON table_name({col1});")
        indexes.append(f"CREATE INDEX idx_{col2} ON table_name({col2});")

    # Find ORDER BY
    order_matches = re.findall(r'ORDER BY\s+(\w+)', query, re.IGNORECASE)
    for col in order_matches:
        indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

    return list(set(indexes))

Usage

conn = psycopg2.connect("dbname=mydb user=postgres") optimizer = QueryOptimizer(conn)

query = """ SELECT u.name, u.email, COUNT(p.id) FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.created_at > '2024-01-01' GROUP BY u.id ORDER BY COUNT(p.id) DESC LIMIT 10 """

result = optimizer.analyze_query(query) for suggestion in result['suggestions']: print(suggestion)

print("\nSuggested indexes:") for index in optimizer.suggest_indexes(query): print(index)

MongoDB Optimization

Analyze Query:

db.users.find({ created_at: { $gt: ISODate("2024-01-01") }, status: "active" }).sort({ created_at: -1 }).explain("executionStats")

Check for issues:

// Check execution stats const stats = db.users.find({ status: "active" }).explain("executionStats");

// Red flags: // - totalDocsExamined >> nReturned (scanning many docs) // - COLLSCAN stage (no index used) // - High executionTimeMillis

// Create index db.users.createIndex({ status: 1, created_at: -1 });

// Compound index for specific query db.users.createIndex({ status: 1, created_at: -1, name: 1 });

ORM Query Optimization

Django:

❌ N+1 Problem

users = User.objects.all() for user in users: print(user.profile.bio) # N queries

✅ select_related (for ForeignKey/OneToOne)

users = User.objects.select_related('profile').all()

✅ prefetch_related (for ManyToMany/reverse ForeignKey)

users = User.objects.prefetch_related('posts').all()

❌ Loading all records

users = User.objects.all() # Loads everything into memory

✅ Use iterator for large datasets

for user in User.objects.iterator(chunk_size=1000): process(user)

❌ Multiple queries

active_users = User.objects.filter(is_active=True).count() inactive_users = User.objects.filter(is_active=False).count()

✅ Single aggregation

from django.db.models import Count, Q stats = User.objects.aggregate( active=Count('id', filter=Q(is_active=True)), inactive=Count('id', filter=Q(is_active=False)) )

TypeORM:

// ❌ N+1 Problem const users = await userRepository.find(); for (const user of users) { const posts = await postRepository.find({ where: { userId: user.id } }); }

// ✅ Use relations const users = await userRepository.find({ relations: ['posts', 'profile'] });

// ✅ Query Builder for complex queries const users = await userRepository .createQueryBuilder('user') .leftJoinAndSelect('user.posts', 'post') .where('user.created_at > :date', { date: '2024-01-01' }) .andWhere('post.status = :status', { status: 'published' }) .getMany();

// Use select to limit columns const users = await userRepository .createQueryBuilder('user') .select(['user.id', 'user.name', 'user.email']) .getMany();

Performance Monitoring

PostgreSQL:

-- Top slow queries SELECT query, calls, total_time, mean_time, max_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;

-- Table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;

MySQL:

-- Slow queries SELECT * FROM mysql.slow_log ORDER BY query_time DESC LIMIT 10;

-- Table statistics SELECT TABLE_NAME, TABLE_ROWS, DATA_LENGTH, INDEX_LENGTH, DATA_FREE FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'your_database' ORDER BY DATA_LENGTH DESC;

Best Practices

DO:

Add indexes on foreign keys
Use EXPLAIN regularly
Monitor slow query log
Use connection pooling
Implement pagination
Cache frequent queries
Use appropriate data types
Regular VACUUM/ANALYZE

DON'T:

Use SELECT *
Over-index (slows writes)
Use LIKE with leading %
Use functions on indexed columns
Ignore N+1 queries
Load entire tables
Skip query analysis
Use OR excessively

Checklist

Slow queries identified
EXPLAIN plans analyzed
Indexes added where needed
N+1 queries fixed
Query rewrites implemented
Monitoring setup
Connection pool configured
Caching implemented

database-query-optimizer

Safety Notice

Copy this and send it to your AI assistant to learn

QUERY PLAN

Python/SQLAlchemy example

❌ N+1 Query Problem

Total: 1 + N queries

✅ Eager Loading

Total: 1 query

❌ N+1 Problem

✅ Solution: includes

Usage

❌ N+1 Problem

✅ select_related (for ForeignKey/OneToOne)

✅ prefetch_related (for ManyToMany/reverse ForeignKey)

❌ Loading all records

✅ Use iterator for large datasets

❌ Multiple queries

✅ Single aggregation

Source Transparency

Related Skills

terraform-module-builder

threejs-scene-builder

api-documentation-generator

react-component-generator