Optimize Query from Query ID
Fetch query → Get profile → Apply best practices → Verify improvement → Return optimized query
Workflow
- Fetch Query Details from Query ID
SELECT query_id, query_text, total_elapsed_time/1000 as seconds, bytes_scanned/1e9 as gb_scanned, bytes_spilled_to_local_storage/1e9 as gb_spilled_local, bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote, partitions_scanned, partitions_total, rows_produced FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE query_id = '<query_id>';
Note the key metrics:
-
seconds : Total execution time
-
gb_scanned : Data read (lower is better)
-
gb_spilled : Spillage indicates memory pressure
-
partitions_scanned/total : Partition pruning effectiveness
- Get Query Profile Details
-- Get operator-level statistics SELECT * FROM TABLE(GET_QUERY_OPERATOR_STATS('<query_id>'));
Look for:
-
Operators with high output_rows vs input_rows (explosions)
-
TableScan operators with high bytes
-
Sort/Aggregate operators with spillage
- Identify Optimization Opportunities
Based on profile, look for:
Metric Issue Fix
partitions_scanned = partitions_total No pruning Add filter on cluster key
gb_spilled > 0 Memory pressure Simplify query, increase warehouse
High bytes_scanned Full scan Add selective filters, reduce columns
Join explosion Cartesian or bad key Fix join condition, filter before join
- Apply Optimizations
Rewrite the query:
-
Select only needed columns
-
Filter early (before joins)
-
Use CTEs to avoid repeated scans
-
Ensure filters align with clustering keys
-
Add LIMIT if full result not needed
- Get Explain Plan for Optimized Query
EXPLAIN USING JSON <optimized_query>;
- Compare Plans
Compare original vs optimized:
-
Fewer partitions scanned?
-
Fewer intermediate rows?
-
Better join order?
- Return Results
Provide:
-
Original query metrics (time, data scanned, spillage)
-
Identified issues
-
The optimized query
-
Summary of changes made
-
Expected improvement
Example Output
Original Query Metrics:
-
Execution time: 45 seconds
-
Data scanned: 12.3 GB
-
Partitions: 500/500 (no pruning)
-
Spillage: 2.1 GB
Issues Found:
-
No partition pruning - filtering on non-cluster column
-
SELECT * scanning unnecessary columns
-
Large table joined without pre-filtering
Optimized Query:
WITH filtered_events AS ( SELECT event_id, user_id, event_type, created_at FROM events WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' AND event_type = 'purchase' ) SELECT fe.event_id, fe.created_at, u.name FROM filtered_events fe JOIN users u ON fe.user_id = u.id;
Changes:
-
Added date range filter matching cluster key
-
Replaced SELECT * with specific columns
-
Pre-filtered in CTE before join
Expected Improvement:
-
Partitions: 500 → ~15 (97% reduction)
-
Data scanned: 12.3 GB → ~0.4 GB
-
Estimated time: 45s → ~3s