optimizing-query-by-id

Optimize Query from Query ID

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "optimizing-query-by-id" with this command: npx skills add altimateai/data-engineering-skills/altimateai-data-engineering-skills-optimizing-query-by-id

Optimize Query from Query ID

Fetch query → Get profile → Apply best practices → Verify improvement → Return optimized query

Workflow

  1. Fetch Query Details from Query ID

SELECT query_id, query_text, total_elapsed_time/1000 as seconds, bytes_scanned/1e9 as gb_scanned, bytes_spilled_to_local_storage/1e9 as gb_spilled_local, bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote, partitions_scanned, partitions_total, rows_produced FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE query_id = '<query_id>';

Note the key metrics:

  • seconds : Total execution time

  • gb_scanned : Data read (lower is better)

  • gb_spilled : Spillage indicates memory pressure

  • partitions_scanned/total : Partition pruning effectiveness

  1. Get Query Profile Details

-- Get operator-level statistics SELECT * FROM TABLE(GET_QUERY_OPERATOR_STATS('<query_id>'));

Look for:

  • Operators with high output_rows vs input_rows (explosions)

  • TableScan operators with high bytes

  • Sort/Aggregate operators with spillage

  1. Identify Optimization Opportunities

Based on profile, look for:

Metric Issue Fix

partitions_scanned = partitions_total No pruning Add filter on cluster key

gb_spilled > 0 Memory pressure Simplify query, increase warehouse

High bytes_scanned Full scan Add selective filters, reduce columns

Join explosion Cartesian or bad key Fix join condition, filter before join

  1. Apply Optimizations

Rewrite the query:

  • Select only needed columns

  • Filter early (before joins)

  • Use CTEs to avoid repeated scans

  • Ensure filters align with clustering keys

  • Add LIMIT if full result not needed

  1. Get Explain Plan for Optimized Query

EXPLAIN USING JSON <optimized_query>;

  1. Compare Plans

Compare original vs optimized:

  • Fewer partitions scanned?

  • Fewer intermediate rows?

  • Better join order?

  1. Return Results

Provide:

  • Original query metrics (time, data scanned, spillage)

  • Identified issues

  • The optimized query

  • Summary of changes made

  • Expected improvement

Example Output

Original Query Metrics:

  • Execution time: 45 seconds

  • Data scanned: 12.3 GB

  • Partitions: 500/500 (no pruning)

  • Spillage: 2.1 GB

Issues Found:

  • No partition pruning - filtering on non-cluster column

  • SELECT * scanning unnecessary columns

  • Large table joined without pre-filtering

Optimized Query:

WITH filtered_events AS ( SELECT event_id, user_id, event_type, created_at FROM events WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' AND event_type = 'purchase' ) SELECT fe.event_id, fe.created_at, u.name FROM filtered_events fe JOIN users u ON fe.user_id = u.id;

Changes:

  • Added date range filter matching cluster key

  • Replaced SELECT * with specific columns

  • Pre-filtered in CTE before join

Expected Improvement:

  • Partitions: 500 → ~15 (97% reduction)

  • Data scanned: 12.3 GB → ~0.4 GB

  • Estimated time: 45s → ~3s

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

documenting-dbt-models

No summary provided by upstream source.

Repository SourceNeeds Review
General

debugging-dbt-errors

No summary provided by upstream source.

Repository SourceNeeds Review
General

migrating-sql-to-dbt

No summary provided by upstream source.

Repository SourceNeeds Review