SQL Query Optimization
This skill provides procedural guidance for optimizing SQL queries effectively, with emphasis on proper benchmarking, query plan analysis, and iterative refinement.
Core Optimization Workflow
- Establish Performance Baselines
Before making any changes, establish clear performance metrics:
-
Identify the target: Determine what "optimal" performance looks like. Look for reference implementations, documented benchmarks, or theoretical analysis of query complexity.
-
Measure the original query: Record execution time with multiple runs to account for caching effects.
-
Document the goal: Define success as "as fast as possible" not just "faster than the original."
- Analyze Query Plans
Always use query plan analysis before and after optimization:
-- SQLite EXPLAIN QUERY PLAN <query>;
-- PostgreSQL EXPLAIN ANALYZE <query>;
-- MySQL EXPLAIN <query>;
Key indicators to look for in query plans:
-
Full table scans (SCAN TABLE) vs index usage (SEARCH TABLE USING INDEX)
-
Correlated subqueries executing per-row
-
Temporary B-tree creation for sorting/grouping
-
Materialization of intermediate results
-
Join order and join types
- Identify Common Performance Issues
Correlated Subqueries: Subqueries that reference outer query columns execute once per row. Transform to JOINs or CTEs when possible.
Repeated Computations: The same subquery appearing multiple times should be factored out into a CTE or subquery.
Missing Indexes: Check for indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
Inefficient Aggregations: GROUP BY on large result sets before filtering. Consider filtering earlier with WHERE.
- Apply Optimization Techniques
Consider multiple approaches rather than stopping at the first working solution:
Approach 1: CTEs (Common Table Expressions)
-
Pre-compute values used multiple times
-
Improve readability
-
Note: Some databases materialize CTEs which may hurt performance
Approach 2: Window Functions
-
Efficient for ranking, running totals, and row numbering
-
Avoid when simpler alternatives exist
Approach 3: Subquery Restructuring
-
Convert correlated subqueries to JOINs
-
Push predicates into subqueries to reduce intermediate result sizes
Approach 4: Join Optimization
-
Reorder joins to filter early
-
Use appropriate join types (INNER vs LEFT)
-
Consider join hints if available
- Database-Specific Considerations
SQLite:
-
No parallel query execution
-
CTEs may be materialized (can hurt performance)
-
Limited optimizer compared to enterprise databases
-
Consider using covering indexes
PostgreSQL:
-
Supports parallel execution
-
Advanced optimizer with cost-based decisions
-
CTEs are optimization barriers in older versions (< 12)
MySQL:
-
Derived table materialization can be forced or avoided
-
Index hints available when optimizer makes poor choices
Verification Strategy
Correctness Verification
-
Full result comparison: Compare all rows and columns against the original query output
-
Edge case testing: Test with NULL values, empty results, ties in sorting/ranking
-
Sample verification is insufficient: If the full result set is too large, use checksums or row counts with spot checks
Performance Verification
-
Multiple runs: Execute at least 3-5 times and use median time
-
Cold vs warm cache: Test both scenarios if relevant
-
Compare against optimal: If a reference solution exists, compare against it, not just the original slow query
-
Incremental profiling: Measure each CTE or subquery independently to identify bottlenecks
Common Pitfalls
Satisficing vs Optimizing
-
Problem: Stopping optimization when query is "faster than before" rather than "as fast as possible"
-
Prevention: Always establish what optimal performance looks like before starting. Compare against known-good implementations when available.
Skipping Query Plan Analysis
-
Problem: Making changes without understanding why the query is slow
-
Prevention: Always run EXPLAIN before and after changes. Understand the query plan before proposing solutions.
Premature CTE Usage
-
Problem: Assuming CTEs always improve performance. Some databases materialize CTEs, adding overhead.
-
Prevention: Test both CTE and non-CTE versions. Profile each CTE independently.
Over-reliance on Window Functions
-
Problem: Using ROW_NUMBER() or similar when simpler approaches work
-
Prevention: Consider if a GROUP BY with MIN/MAX or a simple correlated subquery might be more efficient.
Incomplete Testing
-
Problem: Verifying only a sample of rows due to timeouts
-
Prevention: Use checksums, row counts, or hash comparisons for full validation. Test with smaller data subsets first.
Single-Approach Optimization
-
Problem: Implementing one optimization approach without exploring alternatives
-
Prevention: Always test at least 2-3 different approaches before selecting the best one.
Iterative Refinement Process
-
Analyze the original query and its plan
-
Identify the primary bottleneck
-
Propose 2-3 alternative approaches
-
Implement and benchmark each approach
-
Select the best performing approach
-
Verify correctness with full result comparison
-
Document the optimization rationale
Checklist Before Declaring Success
-
Query plan analyzed and understood
-
Multiple optimization approaches considered
-
Performance compared against optimal/reference (not just original)
-
Full result correctness verified (not just samples)
-
Edge cases tested (NULLs, ties, empty results)
-
Performance measured across multiple runs
-
Database-specific optimizations considered