OceanBase SQL Optimization Skill
Expert in SQL query optimization for OceanBase distributed database (MySQL & Oracle modes).
Provides best practices for writing efficient SQL queries, designing effective indexes, analyzing execution plans, and tuning slow queries in OceanBase. This skill covers both MySQL mode and Oracle mode.
Mode-Specific Syntax Guide
While OceanBase uses the same underlying optimizer for both MySQL and Oracle modes, there are important syntax differences:
| Aspect | MySQL Mode | Oracle Mode |
|---|---|---|
| EXPLAIN Syntax | EXPLAIN [INTO table_name] [SET statement_id = string] | EXPLAIN [INTO table_name] [SET STATEMENT_ID = 'string'] |
| System Views | oceanbase.GV$OB_SQL_AUDIT | GV$OB_SQL_AUDIT (same name, different schema access) |
| Data Types | BIGINT, VARCHAR | NUMBER, VARCHAR2 |
| Time Functions | TIME_TO_USEC(), usec_to_time() | TO_DATE(), TO_CHAR() |
| Schema Access | database.table | schema.table |
| Limit Syntax | LIMIT n | FETCH FIRST n ROWS ONLY |
| String Concatenation | CONCAT() | || operator |
Optimization Principles:
- ✅ Same: Execution plan operators, join algorithms, partition pruning logic
- ✅ Same: Index design principles, query rewriting strategies
- ❌ Different: Syntax, system view access, data type names, function names
Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:
Core Optimization Principles
1. Understand Execution Plans
Always analyze execution plans before optimization:
MySQL Mode:
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';
Oracle Mode:
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';
查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|COST | |
| ------------------------------------------- |
| |0 |TABLE SCAN |order_table|1000 |1000 | |
| |1 | TABLE GET |order_table|1000 |1000 | |
+=========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01']), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p0) |
+------------------------------------------------------------------------------------------------+
Key indicators to check:
- EST. ROWS: Estimated rows processed (lower is better)
- EST.TIME(us): Estimated execution time in microseconds (lower is better)
- OPERATOR: Look for
TABLE SCAN(full table scan, bad) vsTABLE GET(index lookup, good) - Partitions: Check if partition pruning is working (e.g.,
partitions(p1)means only one partition accessed) - is_index_back:
falsemeans no index back,truemeans index back required
2. Index Design Best Practices
Create indexes on frequently queried columns (syntax is the same for both modes):
-- ✅ GOOD: Index on WHERE clause column
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);
-- ✅ GOOD: Composite index for multi-column queries (column order matters)
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- ✅ GOOD: Unique index for unique constraints
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);
-- ✅ GOOD: Index with STORING clause for covering index
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date)
STORING (total_amount, status);
-- ❌ BAD: Index on rarely queried columns
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);
OceanBase Index Types:
- UNIQUE Index: Ensures uniqueness, fast lookups
- LOCAL Index: Partition-local index (default for partitioned tables)
- GLOBAL Index: Cross-partition index
- Function Index: Index on expressions (e.g.,
CREATE INDEX idx_expr ON t1((c1 + c2)))
Index selection rules:
- Index columns used in WHERE clauses
- Index columns used in JOIN conditions
- Consider composite indexes for multi-column filters (column order matters - most selective first)
- Use STORING clause to create covering indexes (avoids index back)
- Avoid over-indexing (each index adds write overhead)
- For partitioned tables, prefer LOCAL indexes unless cross-partition queries are common
3. Query Writing Best Practices
Use specific column lists instead of SELECT *:
-- ✅ GOOD: Select only needed columns
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount
FROM order_table
WHERE order_date >= '2024-01-01';
-- ❌ BAD: Select all columns
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';
Use appropriate WHERE conditions:
-- ✅ GOOD: Use indexed column in WHERE
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- ✅ GOOD: Use range queries on indexed columns
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- ❌ BAD: Function on indexed column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;
-- ✅ GOOD: Rewrite to use index
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
Partition Pruning Optimization
Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency. Always check the partitions field in the execution plan to verify partition pruning is working.
Hash Partitioning
Ensure partition key is in WHERE clause:
-- ✅ GOOD: Partition key in WHERE clause enables partition pruning
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY HASH(order_id) PARTITIONS 5;
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|990 |383 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter(nil), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- Only one partition accessed! |
+------------------------------------------------------------------------------------------------+
-- ❌ BAD: Missing partition key prevents pruning (scans all partitions)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Check execution plan: partitions(p0, p1, p2, p3, p4) - all partitions scanned
Range Partitioning
Use range conditions that match partition boundaries:
-- ✅ GOOD: Range query matches partition boundaries
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY RANGE(order_date) (
PARTITION p0 VALUES LESS THAN('2024-01-01'),
PARTITION p1 VALUES LESS THAN('2024-02-01'),
PARTITION p2 VALUES LESS THAN('2024-03-01')
);
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|1 |46 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- Only partition p1 accessed! |
+------------------------------------------------------------------------------------------------+
-- ❌ BAD: Function prevents partition pruning
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- Check execution plan: partitions(p0, p1, p2) - all partitions scanned
-- ✅ GOOD: Rewrite to use partition pruning
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
Partition Pruning Rules:
- For Hash/List partitions: Partition key must be in WHERE clause with equality or IN conditions
- For Range partitions: Use range conditions that match partition boundaries
- Avoid functions on partition keys in WHERE clauses
- For expressions as partition keys, the expression must appear as a whole in WHERE clause
Join Optimization
OceanBase supports three join algorithms: Nested Loop Join (NLJ), Hash Join (HJ), and Merge Join (MJ). The optimizer automatically selects the best algorithm based on statistics and cost estimation.
Join Algorithms
1. Nested Loop Join (NLJ)
- Best for: Small outer table, indexed inner table
- Use when: Join condition has index on inner table
- Hint:
/*+ USE_NL(table1, table2) */
-- ✅ GOOD: NLJ with indexed join column
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP';
查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------- |
| |0 |NESTED-LOOP JOIN| |990 |37346 | |
| |1 | TABLE SCAN |c |999 |669 | |
| |2 | TABLE GET |o |1 |36 | -- Index lookup! |
| =========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| conds(nil), nl_params_([c.customer_id]) |
| 1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']), |
| access([c.customer_id], [c.customer_name]), partitions(p0) |
| 2 - output([o.order_id]), filter(nil), |
| access([o.order_id]), partitions(p0), |
| range_key([o.customer_id]), range([? = o.customer_id]) -- Index used! |
+------------------------------------------------------------------------------------------------+
2. Hash Join (HJ)
- Best for: Large datasets, equal join conditions
- Use when: Both tables are large and join condition is equality
- Hint:
/*+ USE_HASH(table1, table2) */
-- ✅ GOOD: Hash Join for large table joins
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;
查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ======================================= |
| |ID|OPERATOR |NAME|EST. ROWS|EST.TIME(us)| |
| --------------------------------------- |
| |0 |HASH JOIN | |98010000 |66774608 | |
| |1 | TABLE SCAN|o |100000 |68478 | |
| |2 | TABLE SCAN|c |100000 |68478 | |
| ======================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| equal_conds([o.customer_id = c.customer_id]), other_conds(nil) |
+------------------------------------------------------------------------------------------------+
3. Merge Join (MJ)
- Best for: Pre-sorted data, ordered results needed
- Use when: Both inputs are already sorted or can be sorted efficiently
- Hint:
/*+ USE_MERGE(table1, table2) */
-- ✅ GOOD: Merge Join when data is sorted
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
ORDER BY o.customer_id;
Join Order
Use LEADING hint to control join order:
-- ✅ GOOD: Control join order explicitly
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP'
AND o.order_date >= '2024-01-01';
-- ❌ BAD: Let optimizer choose suboptimal order
obclient [SALES_DB]> SELECT o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
WHERE c.customer_type = 'VIP';
Index on Join Columns
Always index foreign key columns:
-- ✅ GOOD: Index on join column enables efficient joins
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
-- Then join queries can use index (NLJ with TABLE GET)
obclient [SALES_DB]> SELECT o.*, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;
Aggregation Optimization
Use Appropriate Aggregation Functions
-- ✅ GOOD: COUNT(*) is optimized
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';
-- ✅ GOOD: COUNT(column) when you need non-NULL count
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;
-- ❌ BAD: COUNT(DISTINCT) on large datasets can be slow
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;
-- ✅ GOOD: Use GROUP BY with LIMIT (MySQL) or FETCH FIRST (Oracle) for top N
-- MySQL Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
-- Oracle Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
FETCH FIRST 10 ROWS ONLY;
Subquery Optimization
Convert Correlated Subqueries to JOINs
-- ❌ BAD: Correlated subquery (executes for each row)
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM order_table
WHERE customer_id = o.customer_id
);
-- ✅ GOOD: Convert to JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
INNER JOIN (
SELECT customer_id, AVG(total_amount) as avg_amount
FROM order_table
GROUP BY customer_id
) avg_orders ON o.customer_id = avg_orders.customer_id
WHERE o.total_amount > avg_orders.avg_amount;
Use EXISTS instead of IN for large datasets
Syntax is the same for both modes:
-- ✅ GOOD: EXISTS stops at first match
obclient [SALES_DB]> SELECT * FROM order_table o
WHERE EXISTS (
SELECT 1 FROM customer_table c
WHERE c.customer_id = o.customer_id
AND c.customer_type = 'VIP'
);
-- ⚠️ OK: IN works but may be slower for large lists
obclient [SALES_DB]> SELECT * FROM order_table
WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');
Slow Query Tuning
Enable SQL Audit
Configure SQL Audit settings:
-- Enable SQL Audit
obclient> ALTER SYSTEM SET enable_sql_audit = true;
-- Set SQL Audit memory percentage (default: 3%, range: [0,80])
obclient> SET GLOBAL ob_sql_audit_percentage = 3;
Identify Slow Queries
Query TOP SQL by CPU usage:
MySQL Mode: Use oceanbase.GV$OB_SQL_AUDIT with TIME_TO_USEC(NOW())
Oracle Mode: Use GV$OB_SQL_AUDIT with SYSDATE - INTERVAL '30' MINUTE
-- Find TOP SQL by CPU usage in last 30 minutes
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
AVG(execute_time) AS avg_cpu_time,
SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
FROM oceanbase.GV$OB_SQL_AUDIT -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 'mysql001' -- MySQL: string, Oracle: number
AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) -- MySQL
-- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
AND is_executor_rpc = 0
GROUP BY sql_id
HAVING COUNT(*) > 1
ORDER BY cpu_cnt DESC
LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY
Query slow queries by execution time:
MySQL Mode: Use usec_to_time() function
Oracle Mode: Use TO_CHAR() function
-- Find queries with execution time > 100ms
obclient> SELECT request_id,
usec_to_time(request_time) AS request_time, -- MySQL
-- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time, -- Oracle
elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
FROM oceanbase.v$ob_sql_audit -- MySQL: add 'oceanbase.' prefix
WHERE elapsed_time > 100000 -- > 100ms in microseconds
ORDER BY elapsed_time DESC
LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY
Analyze Query Performance
Get detailed execution statistics:
MySQL Mode: Use oceanbase.gv$ob_sql_audit
Oracle Mode: Use GV$OB_SQL_AUDIT (no schema prefix)
-- Get detailed statistics for a specific SQL
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
queue_time, return_rows, affected_rows, partition_cnt,
table_scan, is_hit_plan, plan_id
FROM oceanbase.gv$ob_sql_audit -- MySQL: add 'oceanbase.' prefix
WHERE sql_id = 'your_sql_id'
ORDER BY elapsed_time DESC;
Get execution plan for slow query:
MySQL Mode: Use oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT
Oracle Mode: Use GV$OB_PLAN_CACHE_PLAN_STAT (no schema prefix)
-- Get execution plan from plan cache
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
last_active_time, first_load_time, outline_data
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;
-- Get plan details
obclient> SELECT operator, name, rows, cost
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 1002 AND plan_id = 741
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;
Find Currently Running Slow Queries
MySQL Mode:
obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
svr_ip, svr_port, trace_id
FROM oceanbase.GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;
Oracle Mode:
obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
svr_ip, svr_port, trace_id
FROM GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;
Real-World Optimization Case
Case: Type Conversion Preventing Index Usage
Problem: SQL execution time was 2 seconds, CPU usage exceeded 70% of server resources.
Root Cause Analysis:
-
Identify TOP SQL:
obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id = 'mysql001' AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) GROUP BY sql_id ORDER BY SUM(execute_time) DESC LIMIT 1; -
Analyze Execution Plan:
obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);Execution plan showed:
- High cost on
TBL3(IDX_TBL3_COL170)index scan range(MIN,MIN ; MAX,MAX)always true- index not used for matchingfilter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])- type conversion happening
- High cost on
-
Root Cause:
- View
V_TT01hadTO_NUMBERconversion onCOL001 - Table
TBL3.COL170wasVARCHAR2(20)but compared withNUMBER - Type conversion prevented index usage
- View
Optimization Steps:
-
Remove unnecessary TO_NUMBER conversion:
-- Before: View had TO_NUMBER conversion CREATE VIEW V_TT01_OLD AS SELECT To_number("tt01"."col001") AS "COL001", ... -- After: Remove TO_NUMBER (COL001 is already NUMBER) CREATE VIEW V_TT01 AS SELECT "tt01"."col001" AS "COL001", ...Result: SQL RT reduced from 2s to 150ms
-
Unify data types:
-- Before: TBL3.COL170 was VARCHAR2(20) CREATE TABLE TBL3 ( COL170 VARCHAR2(20) NOT NULL, ... ); -- After: Change to NUMBER(20) to match join condition ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;Result: SQL RT further reduced to 20ms
-
Partition and table group optimization:
-- Add hash partitioning on join keys ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8; ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8; ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8; -- Create table group to avoid cross-partition joins CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL1 SET TABLEGROUP order_tg; ALTER TABLE TBL2 SET TABLEGROUP order_tg; ALTER TABLE TBL3 SET TABLEGROUP order_tg; ALTER TABLE TT01 SET TABLEGROUP order_tg;Result: SQL RT further reduced to 4ms
Final Execution Plan: Shows PX PARTITION ITERATOR (partition pruning working) and TABLE SCAN with index INDEX_TBL3_COL170 (index used efficiently)
Key Learnings:
- Always check execution plan for type conversions in filters
- Ensure join column data types match
- Use partition pruning and table groups for distributed queries
- Remove unnecessary type conversions in views
Common Anti-patterns
❌ Avoid These Patterns
1. N+1 Query Problem:
-- ❌ BAD: Multiple queries in loop
-- Instead, use JOIN or batch queries
2. SELECT * in Production:
-- ❌ BAD: SELECT * returns unnecessary data
SELECT * FROM order_table;
-- ✅ GOOD: Select only needed columns
SELECT order_id, customer_id, total_amount FROM order_table;
3. Functions on Indexed Columns:
-- ❌ BAD: Prevents index usage
WHERE UPPER(customer_name) = 'JOHN'
-- ✅ GOOD: Store normalized data or use function-based index
WHERE customer_name = 'JOHN'
4. Implicit Type Conversions:
-- ❌ BAD: String to number conversion
WHERE order_id = '12345'
-- ✅ GOOD: Match data types
WHERE order_id = 12345
Performance Monitoring
Key Metrics to Monitor
-- Check table statistics
obclient [SALES_DB]> ANALYZE TABLE order_table;
-- View table size and row count
obclient [SALES_DB]> SELECT
table_name,
table_rows,
data_length,
index_length
FROM information_schema.tables
WHERE table_schema = 'SALES_DB';
-- Check index usage
obclient [SALES_DB]> SHOW INDEX FROM order_table;
Optimization Checklist
Before deploying SQL queries to production:
- Execution plan reviewed (EXPLAIN)
- Appropriate indexes created
- Partition pruning enabled (if partitioned)
- SELECT columns limited to needed fields
- WHERE conditions use indexed columns
- JOIN conditions indexed
- Subqueries optimized (JOINs preferred)
- Aggregations use efficient functions
- No functions on indexed columns in WHERE
- Data types match in comparisons
- Query tested with production-like data volume
Quick Reference
Execution Plan Operators
| Operator | Meaning | Optimization |
|---|---|---|
| TABLE SCAN / TABLE FULL SCAN | Full table scan | Add index or use partition key |
| TABLE RANGE SCAN | Range scan on table | ✅ Good, but consider index |
| TABLE GET | Direct row access via primary key | ✅ Best |
| HASH JOIN | Hash join algorithm | ✅ Good for large equal joins |
| NESTED-LOOP JOIN | Nested loop join | ✅ Good for small outer table with indexed inner |
| MERGE JOIN | Merge join algorithm | ✅ Good for pre-sorted data |
| SORT | Sorting operation | Add ORDER BY index or use sorted index |
| AGGREGATE | Aggregation | Consider materialized views |
| PX PARTITION ITERATOR | Parallel partition iterator | ✅ Good, partition pruning working |
| EXCHANGE OUT DISTR | Distributed exchange | Indicates distributed execution |
Index Types
- Primary Key: Automatically indexed, unique
- Unique Index: Ensures uniqueness, fast lookups
- Composite Index: Multiple columns, order matters
- Covering Index: Contains all query columns