query-optimize

SQL Query Optimization

This skill provides procedural guidance for optimizing SQL queries effectively, with emphasis on proper benchmarking, query plan analysis, and iterative refinement.

Core Optimization Workflow

Establish Performance Baselines

Before making any changes, establish clear performance metrics:

Identify the target: Determine what "optimal" performance looks like. Look for reference implementations, documented benchmarks, or theoretical analysis of query complexity.
Measure the original query: Record execution time with multiple runs to account for caching effects.
Document the goal: Define success as "as fast as possible" not just "faster than the original."

Analyze Query Plans

Always use query plan analysis before and after optimization:

-- SQLite EXPLAIN QUERY PLAN <query>;

-- PostgreSQL EXPLAIN ANALYZE <query>;

-- MySQL EXPLAIN <query>;

Key indicators to look for in query plans:

Full table scans (SCAN TABLE) vs index usage (SEARCH TABLE USING INDEX)
Correlated subqueries executing per-row
Temporary B-tree creation for sorting/grouping
Materialization of intermediate results
Join order and join types

Identify Common Performance Issues

Correlated Subqueries: Subqueries that reference outer query columns execute once per row. Transform to JOINs or CTEs when possible.

Repeated Computations: The same subquery appearing multiple times should be factored out into a CTE or subquery.

Missing Indexes: Check for indexes on columns used in WHERE, JOIN, and ORDER BY clauses.

Inefficient Aggregations: GROUP BY on large result sets before filtering. Consider filtering earlier with WHERE.

Apply Optimization Techniques

Consider multiple approaches rather than stopping at the first working solution:

Approach 1: CTEs (Common Table Expressions)

Pre-compute values used multiple times
Improve readability
Note: Some databases materialize CTEs which may hurt performance

Approach 2: Window Functions

Efficient for ranking, running totals, and row numbering
Avoid when simpler alternatives exist

Approach 3: Subquery Restructuring

Convert correlated subqueries to JOINs
Push predicates into subqueries to reduce intermediate result sizes

Approach 4: Join Optimization

Reorder joins to filter early
Use appropriate join types (INNER vs LEFT)
Consider join hints if available

Database-Specific Considerations

SQLite:

No parallel query execution
CTEs may be materialized (can hurt performance)
Limited optimizer compared to enterprise databases
Consider using covering indexes

PostgreSQL:

Supports parallel execution
Advanced optimizer with cost-based decisions
CTEs are optimization barriers in older versions (< 12)

MySQL:

Derived table materialization can be forced or avoided
Index hints available when optimizer makes poor choices

Verification Strategy

Correctness Verification

Full result comparison: Compare all rows and columns against the original query output
Edge case testing: Test with NULL values, empty results, ties in sorting/ranking
Sample verification is insufficient: If the full result set is too large, use checksums or row counts with spot checks

Performance Verification

Multiple runs: Execute at least 3-5 times and use median time
Cold vs warm cache: Test both scenarios if relevant
Compare against optimal: If a reference solution exists, compare against it, not just the original slow query
Incremental profiling: Measure each CTE or subquery independently to identify bottlenecks

Common Pitfalls

Satisficing vs Optimizing

Problem: Stopping optimization when query is "faster than before" rather than "as fast as possible"
Prevention: Always establish what optimal performance looks like before starting. Compare against known-good implementations when available.

Skipping Query Plan Analysis

Problem: Making changes without understanding why the query is slow
Prevention: Always run EXPLAIN before and after changes. Understand the query plan before proposing solutions.

Premature CTE Usage

Problem: Assuming CTEs always improve performance. Some databases materialize CTEs, adding overhead.
Prevention: Test both CTE and non-CTE versions. Profile each CTE independently.

Over-reliance on Window Functions

Problem: Using ROW_NUMBER() or similar when simpler approaches work
Prevention: Consider if a GROUP BY with MIN/MAX or a simple correlated subquery might be more efficient.

Incomplete Testing

Problem: Verifying only a sample of rows due to timeouts
Prevention: Use checksums, row counts, or hash comparisons for full validation. Test with smaller data subsets first.

Single-Approach Optimization

Problem: Implementing one optimization approach without exploring alternatives
Prevention: Always test at least 2-3 different approaches before selecting the best one.

Iterative Refinement Process

Analyze the original query and its plan
Identify the primary bottleneck
Propose 2-3 alternative approaches
Implement and benchmark each approach
Select the best performing approach
Verify correctness with full result comparison
Document the optimization rationale

Checklist Before Declaring Success

Query plan analyzed and understood
Multiple optimization approaches considered
Performance compared against optimal/reference (not just original)
Full result correctness verified (not just samples)
Edge cases tested (NULLs, ties, empty results)
Performance measured across multiple runs
Database-specific optimizations considered

query-optimize

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

extracting-pdf-text

video-processing

google-workspace

portfolio-optimization