optimizing-query-by-id

Optimize Query from Query ID

Fetch query → Get profile → Apply best practices → Verify improvement → Return optimized query

Workflow

Fetch Query Details from Query ID

SELECT query_id, query_text, total_elapsed_time/1000 as seconds, bytes_scanned/1e9 as gb_scanned, bytes_spilled_to_local_storage/1e9 as gb_spilled_local, bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote, partitions_scanned, partitions_total, rows_produced FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE query_id = '<query_id>';

Note the key metrics:

seconds : Total execution time
gb_scanned : Data read (lower is better)
gb_spilled : Spillage indicates memory pressure
partitions_scanned/total : Partition pruning effectiveness

Get Query Profile Details

-- Get operator-level statistics SELECT * FROM TABLE(GET_QUERY_OPERATOR_STATS('<query_id>'));

Look for:

Operators with high output_rows vs input_rows (explosions)
TableScan operators with high bytes
Sort/Aggregate operators with spillage

Identify Optimization Opportunities

Based on profile, look for:

Metric Issue Fix

partitions_scanned = partitions_total No pruning Add filter on cluster key

gb_spilled > 0 Memory pressure Simplify query, increase warehouse

High bytes_scanned Full scan Add selective filters, reduce columns

Join explosion Cartesian or bad key Fix join condition, filter before join

Apply Optimizations

Rewrite the query:

Select only needed columns
Filter early (before joins)
Use CTEs to avoid repeated scans
Ensure filters align with clustering keys
Add LIMIT if full result not needed

Get Explain Plan for Optimized Query

EXPLAIN USING JSON <optimized_query>;

Compare Plans

Compare original vs optimized:

Fewer partitions scanned?
Fewer intermediate rows?
Better join order?

Return Results

Provide:

Original query metrics (time, data scanned, spillage)
Identified issues
The optimized query
Summary of changes made
Expected improvement

Example Output

Original Query Metrics:

Execution time: 45 seconds
Data scanned: 12.3 GB
Partitions: 500/500 (no pruning)
Spillage: 2.1 GB

Issues Found:

No partition pruning - filtering on non-cluster column
SELECT * scanning unnecessary columns
Large table joined without pre-filtering

Optimized Query:

WITH filtered_events AS ( SELECT event_id, user_id, event_type, created_at FROM events WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' AND event_type = 'purchase' ) SELECT fe.event_id, fe.created_at, u.name FROM filtered_events fe JOIN users u ON fe.user_id = u.id;

Changes:

Added date range filter matching cluster key
Replaced SELECT * with specific columns
Pre-filtered in CTE before join

Expected Improvement:

Partitions: 500 → ~15 (97% reduction)
Data scanned: 12.3 GB → ~0.4 GB
Estimated time: 45s → ~3s

optimizing-query-by-id

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

documenting-dbt-models

debugging-dbt-errors

migrating-sql-to-dbt