database-query-optimizer

Database Query Optimizer

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "database-query-optimizer" with this command: npx skills add dexploarer/claudius-skills/dexploarer-claudius-skills-database-query-optimizer

Database Query Optimizer

Analyzes database queries, interprets EXPLAIN plans, suggests indexes, and detects common performance issues like N+1 queries.

When to Use

  • "Optimize my database query"

  • "Analyze EXPLAIN plan"

  • "Why is my query slow?"

  • "Suggest indexes"

  • "Fix N+1 queries"

  • "Improve database performance"

Instructions

  1. PostgreSQL Query Analysis

Run EXPLAIN:

EXPLAIN ANALYZE SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.created_at > '2024-01-01' GROUP BY u.id, u.name ORDER BY post_count DESC LIMIT 10;

Interpret EXPLAIN output:

QUERY PLAN

Limit (cost=1234.56..1234.58 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1) -> Sort (cost=1234.56..1345.67 rows=44444 width=40) (actual time=45.122..45.123 rows=10 loops=1) Sort Key: (count(p.id)) DESC Sort Method: top-N heapsort Memory: 25kB -> HashAggregate (cost=1000.00..1200.00 rows=44444 width=40) (actual time=40.456..42.789 rows=45000 loops=1) Group Key: u.id -> Hash Left Join (cost=100.00..900.00 rows=50000 width=32) (actual time=1.234..35.678 rows=100000 loops=1) Hash Cond: (p.user_id = u.id) -> Seq Scan on posts p (cost=0.00..500.00 rows=50000 width=4) (actual time=0.010..10.234 rows=50000 loops=1) -> Hash (cost=75.00..75.00 rows=2000 width=32) (actual time=1.200..1.200 rows=2000 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 125kB -> Seq Scan on users u (cost=0.00..75.00 rows=2000 width=32) (actual time=0.005..0.678 rows=2000 loops=1) Filter: (created_at > '2024-01-01'::date) Rows Removed by Filter: 500 Planning Time: 0.234 ms Execution Time: 45.234 ms

Key metrics to analyze:

  • cost: Estimated cost (first number = startup, second = total)

  • rows: Estimated rows returned

  • width: Average row size in bytes

  • actual time: Real execution time (ms)

  • loops: Number of times node executed

Red flags:

  • Sequential Scan on large tables

  • High cost values

  • Rows estimate far from actual

  • Multiple loops

  • Slow execution time

  1. Optimization Strategies

Add Index:

-- Create index on filtered column CREATE INDEX idx_users_created_at ON users(created_at);

-- Create index on join column CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Composite index for specific query pattern CREATE INDEX idx_users_created_name ON users(created_at, name);

-- Partial index for common filter CREATE INDEX idx_users_recent ON users(created_at) WHERE created_at > '2024-01-01';

-- Covering index (includes all needed columns) CREATE INDEX idx_users_covering ON users(id, name, created_at);

Rewrite Query:

-- ❌ BAD: Subquery in SELECT SELECT u.name, (SELECT COUNT(*) FROM posts WHERE user_id = u.id) as post_count FROM users u;

-- ✅ GOOD: Use JOIN SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON u.id = p.user_id GROUP BY u.id, u.name;

-- ❌ BAD: OR conditions SELECT * FROM users WHERE email = 'test@example.com' OR username = 'test';

-- ✅ GOOD: Use UNION (can use separate indexes) SELECT * FROM users WHERE email = 'test@example.com' UNION SELECT * FROM users WHERE username = 'test';

-- ❌ BAD: Function on indexed column SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- ✅ GOOD: Functional index or avoid function CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- Or just: SELECT * FROM users WHERE email = 'test@example.com';

  1. N+1 Query Detection

Problem:

Python/SQLAlchemy example

❌ N+1 Query Problem

users = User.query.all() # 1 query for user in users: posts = user.posts # N queries (one per user) print(f"{user.name}: {len(posts)} posts")

Total: 1 + N queries

Solution:

✅ Eager Loading

users = User.query.options(joinedload(User.posts)).all() # 1 query for user in users: posts = user.posts # No additional query print(f"{user.name}: {len(posts)} posts")

Total: 1 query

Node.js/Sequelize:

// ❌ N+1 Problem const users = await User.findAll(); for (const user of users) { const posts = await user.getPosts(); // N queries }

// ✅ Solution: Include associations const users = await User.findAll({ include: [{ model: Post }] // 1 query with JOIN });

Rails/ActiveRecord:

❌ N+1 Problem

users = User.all users.each do |user| puts user.posts.count # N queries end

✅ Solution: includes

users = User.includes(:posts) users.each do |user| puts user.posts.count # No additional queries end

  1. Index Suggestions

Automated analysis:

-- PostgreSQL: Find missing indexes SELECT schemaname, tablename, attname, n_distinct, correlation FROM pg_stats WHERE schemaname = 'public' AND n_distinct > 100 AND correlation < 0.5 ORDER BY n_distinct DESC;

-- Find tables with sequential scans SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 0 AND seq_tup_read / seq_scan > 10000 ORDER BY seq_tup_read DESC;

-- Unused indexes SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_toast%' ORDER BY pg_relation_size(indexrelid) DESC;

MySQL:

-- Missing indexes SELECT * FROM sys.schema_unused_indexes;

-- Duplicate indexes SELECT * FROM sys.schema_redundant_indexes;

-- Table scan queries SELECT * FROM sys.statements_with_full_table_scans LIMIT 10;

  1. Query Optimization Checklist

Python Script:

#!/usr/bin/env python3 import psycopg2 import re

class QueryOptimizer: def init(self, conn): self.conn = conn

def analyze_query(self, query):
    """Analyze query and provide optimization suggestions."""
    suggestions = []

    # Check for SELECT *
    if re.search(r'SELECT\s+\*', query, re.IGNORECASE):
        suggestions.append("❌ Avoid SELECT *. Specify only needed columns.")

    # Check for missing WHERE clause
    if re.search(r'FROM\s+\w+', query, re.IGNORECASE) and \
       not re.search(r'WHERE', query, re.IGNORECASE):
        suggestions.append("⚠️  No WHERE clause. Consider adding filters.")

    # Check for OR in WHERE
    if re.search(r'WHERE.*\sOR\s', query, re.IGNORECASE):
        suggestions.append("⚠️  OR conditions may prevent index usage. Consider UNION.")

    # Check for functions on indexed columns
    if re.search(r'WHERE\s+\w+\([^\)]+\)\s*=', query, re.IGNORECASE):
        suggestions.append("❌ Functions on columns prevent index usage.")

    # Check for LIKE with leading wildcard
    if re.search(r'LIKE\s+[\'"]%', query, re.IGNORECASE):
        suggestions.append("❌ LIKE with leading % cannot use index.")

    # Run EXPLAIN
    cursor = self.conn.cursor()
    try:
        cursor.execute(f"EXPLAIN ANALYZE {query}")
        plan = cursor.fetchall()

        # Check for sequential scans
        plan_str = str(plan)
        if 'Seq Scan' in plan_str:
            suggestions.append("❌ Sequential scan detected. Consider adding index.")

        # Check for high cost
        cost_match = re.search(r'cost=(\d+\.\d+)', plan_str)
        if cost_match:
            cost = float(cost_match.group(1))
            if cost > 10000:
                suggestions.append(f"⚠️  High query cost: {cost:.2f}")

        return {
            'suggestions': suggestions,
            'explain_plan': plan
        }
    finally:
        cursor.close()

def suggest_indexes(self, query):
    """Suggest indexes based on query pattern."""
    indexes = []

    # Find WHERE conditions
    where_matches = re.findall(r'WHERE\s+(\w+)\s*[=&#x3C;>]', query, re.IGNORECASE)
    for col in where_matches:
        indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

    # Find JOIN conditions
    join_matches = re.findall(r'ON\s+\w+\.(\w+)\s*=\s*\w+\.(\w+)', query, re.IGNORECASE)
    for col1, col2 in join_matches:
        indexes.append(f"CREATE INDEX idx_{col1} ON table_name({col1});")
        indexes.append(f"CREATE INDEX idx_{col2} ON table_name({col2});")

    # Find ORDER BY
    order_matches = re.findall(r'ORDER BY\s+(\w+)', query, re.IGNORECASE)
    for col in order_matches:
        indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

    return list(set(indexes))

Usage

conn = psycopg2.connect("dbname=mydb user=postgres") optimizer = QueryOptimizer(conn)

query = """ SELECT u.name, u.email, COUNT(p.id) FROM users u LEFT JOIN posts p ON u.id = p.user_id WHERE u.created_at > '2024-01-01' GROUP BY u.id ORDER BY COUNT(p.id) DESC LIMIT 10 """

result = optimizer.analyze_query(query) for suggestion in result['suggestions']: print(suggestion)

print("\nSuggested indexes:") for index in optimizer.suggest_indexes(query): print(index)

  1. MongoDB Optimization

Analyze Query:

db.users.find({ created_at: { $gt: ISODate("2024-01-01") }, status: "active" }).sort({ created_at: -1 }).explain("executionStats")

Check for issues:

// Check execution stats const stats = db.users.find({ status: "active" }).explain("executionStats");

// Red flags: // - totalDocsExamined >> nReturned (scanning many docs) // - COLLSCAN stage (no index used) // - High executionTimeMillis

// Create index db.users.createIndex({ status: 1, created_at: -1 });

// Compound index for specific query db.users.createIndex({ status: 1, created_at: -1, name: 1 });

  1. ORM Query Optimization

Django:

❌ N+1 Problem

users = User.objects.all() for user in users: print(user.profile.bio) # N queries

✅ select_related (for ForeignKey/OneToOne)

users = User.objects.select_related('profile').all()

✅ prefetch_related (for ManyToMany/reverse ForeignKey)

users = User.objects.prefetch_related('posts').all()

❌ Loading all records

users = User.objects.all() # Loads everything into memory

✅ Use iterator for large datasets

for user in User.objects.iterator(chunk_size=1000): process(user)

❌ Multiple queries

active_users = User.objects.filter(is_active=True).count() inactive_users = User.objects.filter(is_active=False).count()

✅ Single aggregation

from django.db.models import Count, Q stats = User.objects.aggregate( active=Count('id', filter=Q(is_active=True)), inactive=Count('id', filter=Q(is_active=False)) )

TypeORM:

// ❌ N+1 Problem const users = await userRepository.find(); for (const user of users) { const posts = await postRepository.find({ where: { userId: user.id } }); }

// ✅ Use relations const users = await userRepository.find({ relations: ['posts', 'profile'] });

// ✅ Query Builder for complex queries const users = await userRepository .createQueryBuilder('user') .leftJoinAndSelect('user.posts', 'post') .where('user.created_at > :date', { date: '2024-01-01' }) .andWhere('post.status = :status', { status: 'published' }) .getMany();

// Use select to limit columns const users = await userRepository .createQueryBuilder('user') .select(['user.id', 'user.name', 'user.email']) .getMany();

  1. Performance Monitoring

PostgreSQL:

-- Top slow queries SELECT query, calls, total_time, mean_time, max_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;

-- Table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;

MySQL:

-- Slow queries SELECT * FROM mysql.slow_log ORDER BY query_time DESC LIMIT 10;

-- Table statistics SELECT TABLE_NAME, TABLE_ROWS, DATA_LENGTH, INDEX_LENGTH, DATA_FREE FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'your_database' ORDER BY DATA_LENGTH DESC;

Best Practices

DO:

  • Add indexes on foreign keys

  • Use EXPLAIN regularly

  • Monitor slow query log

  • Use connection pooling

  • Implement pagination

  • Cache frequent queries

  • Use appropriate data types

  • Regular VACUUM/ANALYZE

DON'T:

  • Use SELECT *

  • Over-index (slows writes)

  • Use LIKE with leading %

  • Use functions on indexed columns

  • Ignore N+1 queries

  • Load entire tables

  • Skip query analysis

  • Use OR excessively

Checklist

  • Slow queries identified

  • EXPLAIN plans analyzed

  • Indexes added where needed

  • N+1 queries fixed

  • Query rewrites implemented

  • Monitoring setup

  • Connection pool configured

  • Caching implemented

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

terraform-module-builder

No summary provided by upstream source.

Repository SourceNeeds Review
General

threejs-scene-builder

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-documentation-generator

No summary provided by upstream source.

Repository SourceNeeds Review
General

react-component-generator

No summary provided by upstream source.

Repository SourceNeeds Review