OceanBase SQL Optimization Skill

Expert in SQL query optimization for OceanBase distributed database (MySQL & Oracle modes).

Provides best practices for writing efficient SQL queries, designing effective indexes, analyzing execution plans, and tuning slow queries in OceanBase. This skill covers both MySQL mode and Oracle mode.

Mode-Specific Syntax Guide

While OceanBase uses the same underlying optimizer for both MySQL and Oracle modes, there are important syntax differences:

Aspect	MySQL Mode	Oracle Mode
EXPLAIN Syntax	`EXPLAIN [INTO table_name] [SET statement_id = string]`	`EXPLAIN [INTO table_name] [SET STATEMENT_ID = 'string']`
System Views	`oceanbase.GV$OB_SQL_AUDIT`	`GV$OB_SQL_AUDIT` (same name, different schema access)
Data Types	`BIGINT`, `VARCHAR`	`NUMBER`, `VARCHAR2`
Time Functions	`TIME_TO_USEC()`, `usec_to_time()`	`TO_DATE()`, `TO_CHAR()`
Schema Access	`database.table`	`schema.table`
Limit Syntax	`LIMIT n`	`FETCH FIRST n ROWS ONLY`
String Concatenation	`CONCAT()`	`\|\|` operator

Optimization Principles:

✅ Same: Execution plan operators, join algorithms, partition pruning logic
✅ Same: Index design principles, query rewriting strategies
❌ Different: Syntax, system view access, data type names, function names

Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:

Core Optimization Principles

1. Understand Execution Plans

Always analyze execution plans before optimization:

MySQL Mode:

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';

Oracle Mode:

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR              |NAME      |EST. ROWS|COST |                                        |
| -------------------------------------------                                                    |
| |0 |TABLE SCAN            |order_table|1000    |1000 |                                        |
| |1 | TABLE GET            |order_table|1000    |1000 |                                        |
+===========================================                                                      |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       filter([order_table.order_date >= '2024-01-01']),                                        |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),    |
|       partitions(p0)                                                                            |
+------------------------------------------------------------------------------------------------+

Key indicators to check:

EST. ROWS: Estimated rows processed (lower is better)
EST.TIME(us): Estimated execution time in microseconds (lower is better)
OPERATOR: Look for TABLE SCAN (full table scan, bad) vs TABLE GET (index lookup, good)
Partitions: Check if partition pruning is working (e.g., partitions(p1) means only one partition accessed)
is_index_back: false means no index back, true means index back required

2. Index Design Best Practices

Create indexes on frequently queried columns (syntax is the same for both modes):

-- ✅ GOOD: Index on WHERE clause column
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);

-- ✅ GOOD: Composite index for multi-column queries (column order matters)
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);

-- ✅ GOOD: Unique index for unique constraints
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);

-- ✅ GOOD: Index with STORING clause for covering index
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) 
                     STORING (total_amount, status);

-- ❌ BAD: Index on rarely queried columns
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);

OceanBase Index Types:

UNIQUE Index: Ensures uniqueness, fast lookups
LOCAL Index: Partition-local index (default for partitioned tables)
GLOBAL Index: Cross-partition index
Function Index: Index on expressions (e.g., CREATE INDEX idx_expr ON t1((c1 + c2)))

Index selection rules:

Index columns used in WHERE clauses
Index columns used in JOIN conditions
Consider composite indexes for multi-column filters (column order matters - most selective first)
Use STORING clause to create covering indexes (avoids index back)
Avoid over-indexing (each index adds write overhead)
For partitioned tables, prefer LOCAL indexes unless cross-partition queries are common

3. Query Writing Best Practices

Use specific column lists instead of SELECT *:

-- ✅ GOOD: Select only needed columns
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount 
                     FROM order_table 
                     WHERE order_date >= '2024-01-01';

-- ❌ BAD: Select all columns
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';

Use appropriate WHERE conditions:

-- ✅ GOOD: Use indexed column in WHERE
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

-- ✅ GOOD: Use range queries on indexed columns
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- ❌ BAD: Function on indexed column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;

-- ✅ GOOD: Rewrite to use index
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

Partition Pruning Optimization

Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency. Always check the partitions field in the execution plan to verify partition pruning is working.

Hash Partitioning

Ensure partition key is in WHERE clause:

-- ✅ GOOD: Partition key in WHERE clause enables partition pruning
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY HASH(order_id) PARTITIONS 5;

obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|990      |383         |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter(nil),                                                                             |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- Only one partition accessed!                                         |
+------------------------------------------------------------------------------------------------+

-- ❌ BAD: Missing partition key prevents pruning (scans all partitions)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Check execution plan: partitions(p0, p1, p2, p3, p4) - all partitions scanned

Range Partitioning

Use range conditions that match partition boundaries:

-- ✅ GOOD: Range query matches partition boundaries
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY RANGE(order_date) (
                     PARTITION p0 VALUES LESS THAN('2024-01-01'),
                     PARTITION p1 VALUES LESS THAN('2024-02-01'),
                     PARTITION p2 VALUES LESS THAN('2024-03-01')
                     );

obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|1        |46          |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- Only partition p1 accessed!                                        |
+------------------------------------------------------------------------------------------------+

-- ❌ BAD: Function prevents partition pruning
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- Check execution plan: partitions(p0, p1, p2) - all partitions scanned

-- ✅ GOOD: Rewrite to use partition pruning
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

Partition Pruning Rules:

For Hash/List partitions: Partition key must be in WHERE clause with equality or IN conditions
For Range partitions: Use range conditions that match partition boundaries
Avoid functions on partition keys in WHERE clauses
For expressions as partition keys, the expression must appear as a whole in WHERE clause

Join Optimization

OceanBase supports three join algorithms: Nested Loop Join (NLJ), Hash Join (HJ), and Merge Join (MJ). The optimizer automatically selects the best algorithm based on statistics and cost estimation.

Join Algorithms

1. Nested Loop Join (NLJ)

Best for: Small outer table, indexed inner table
Use when: Join condition has index on inner table
Hint: /*+ USE_NL(table1, table2) */

-- ✅ GOOD: NLJ with indexed join column
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR        |NAME  |EST. ROWS|EST.TIME(us)|                                           |
| -------------------------------------------                                                    |
| |0 |NESTED-LOOP JOIN|      |990      |37346       |                                           |
| |1 | TABLE SCAN     |c     |999      |669         |                                           |
| |2 | TABLE GET      |o     |1        |36          |  -- Index lookup!                         |
| ===========================================                                                    |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                   |
|       conds(nil), nl_params_([c.customer_id])                                                 |
|   1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']),         |
|       access([c.customer_id], [c.customer_name]), partitions(p0)                             |
|   2 - output([o.order_id]), filter(nil),                                                      |
|       access([o.order_id]), partitions(p0),                                                    |
|       range_key([o.customer_id]), range([? = o.customer_id])  -- Index used!                  |
+------------------------------------------------------------------------------------------------+

2. Hash Join (HJ)

Best for: Large datasets, equal join conditions
Use when: Both tables are large and join condition is equality
Hint: /*+ USE_HASH(table1, table2) */

-- ✅ GOOD: Hash Join for large table joins
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =======================================                                                        |
| |ID|OPERATOR   |NAME|EST. ROWS|EST.TIME(us)|                                                   |
| ---------------------------------------                                                        |
| |0 |HASH JOIN  |    |98010000 |66774608    |                                                   |
| |1 | TABLE SCAN|o   |100000   |68478       |                                                   |
| |2 | TABLE SCAN|c   |100000   |68478       |                                                   |
| =======================================                                                        |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                    |
|       equal_conds([o.customer_id = c.customer_id]), other_conds(nil)                           |
+------------------------------------------------------------------------------------------------+

3. Merge Join (MJ)

Best for: Pre-sorted data, ordered results needed
Use when: Both inputs are already sorted or can be sorted efficiently
Hint: /*+ USE_MERGE(table1, table2) */

-- ✅ GOOD: Merge Join when data is sorted
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     ORDER BY o.customer_id;

Join Order

Use LEADING hint to control join order:

-- ✅ GOOD: Control join order explicitly
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP'
                     AND o.order_date >= '2024-01-01';

-- ❌ BAD: Let optimizer choose suboptimal order
obclient [SALES_DB]> SELECT o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     WHERE c.customer_type = 'VIP';

Index on Join Columns

Always index foreign key columns:

-- ✅ GOOD: Index on join column enables efficient joins
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

-- Then join queries can use index (NLJ with TABLE GET)
obclient [SALES_DB]> SELECT o.*, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

Aggregation Optimization

Use Appropriate Aggregation Functions

-- ✅ GOOD: COUNT(*) is optimized
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';

-- ✅ GOOD: COUNT(column) when you need non-NULL count
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;

-- ❌ BAD: COUNT(DISTINCT) on large datasets can be slow
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;

-- ✅ GOOD: Use GROUP BY with LIMIT (MySQL) or FETCH FIRST (Oracle) for top N
-- MySQL Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     LIMIT 10;

-- Oracle Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     FETCH FIRST 10 ROWS ONLY;

Subquery Optimization

Convert Correlated Subqueries to JOINs

-- ❌ BAD: Correlated subquery (executes for each row)
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     WHERE o.total_amount > (
                         SELECT AVG(total_amount) 
                         FROM order_table 
                         WHERE customer_id = o.customer_id
                     );

-- ✅ GOOD: Convert to JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     INNER JOIN (
                         SELECT customer_id, AVG(total_amount) as avg_amount
                         FROM order_table
                         GROUP BY customer_id
                     ) avg_orders ON o.customer_id = avg_orders.customer_id
                     WHERE o.total_amount > avg_orders.avg_amount;

Use EXISTS instead of IN for large datasets

Syntax is the same for both modes:

-- ✅ GOOD: EXISTS stops at first match
obclient [SALES_DB]> SELECT * FROM order_table o
                     WHERE EXISTS (
                         SELECT 1 FROM customer_table c
                         WHERE c.customer_id = o.customer_id
                         AND c.customer_type = 'VIP'
                     );

-- ⚠️ OK: IN works but may be slower for large lists
obclient [SALES_DB]> SELECT * FROM order_table
                     WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');

Slow Query Tuning

Enable SQL Audit

Configure SQL Audit settings:

-- Enable SQL Audit
obclient> ALTER SYSTEM SET enable_sql_audit = true;

-- Set SQL Audit memory percentage (default: 3%, range: [0,80])
obclient> SET GLOBAL ob_sql_audit_percentage = 3;

Identify Slow Queries

Query TOP SQL by CPU usage:

MySQL Mode: Use oceanbase.GV$OB_SQL_AUDIT with TIME_TO_USEC(NOW())
Oracle Mode: Use GV$OB_SQL_AUDIT with SYSDATE - INTERVAL '30' MINUTE

-- Find TOP SQL by CPU usage in last 30 minutes
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
                AVG(execute_time) AS avg_cpu_time,
                SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
         FROM oceanbase.GV$OB_SQL_AUDIT  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 'mysql001'  -- MySQL: string, Oracle: number
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())  -- MySQL
           -- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
           AND is_executor_rpc = 0
         GROUP BY sql_id
         HAVING COUNT(*) > 1
         ORDER BY cpu_cnt DESC
         LIMIT 10;  -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY

Query slow queries by execution time:

MySQL Mode: Use usec_to_time() function
Oracle Mode: Use TO_CHAR() function

-- Find queries with execution time > 100ms
obclient> SELECT request_id,
                usec_to_time(request_time) AS request_time,  -- MySQL
                -- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time,  -- Oracle
                elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
         FROM oceanbase.v$ob_sql_audit  -- MySQL: add 'oceanbase.' prefix
         WHERE elapsed_time > 100000  -- > 100ms in microseconds
         ORDER BY elapsed_time DESC
         LIMIT 10;  -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY

Analyze Query Performance

Get detailed execution statistics:

MySQL Mode: Use oceanbase.gv$ob_sql_audit
Oracle Mode: Use GV$OB_SQL_AUDIT (no schema prefix)

-- Get detailed statistics for a specific SQL
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
                queue_time, return_rows, affected_rows, partition_cnt,
                table_scan, is_hit_plan, plan_id
         FROM oceanbase.gv$ob_sql_audit  -- MySQL: add 'oceanbase.' prefix
         WHERE sql_id = 'your_sql_id'
         ORDER BY elapsed_time DESC;

Get execution plan for slow query:

MySQL Mode: Use oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT
Oracle Mode: Use GV$OB_PLAN_CACHE_PLAN_STAT (no schema prefix)

-- Get execution plan from plan cache
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
                last_active_time, first_load_time, outline_data
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

-- Get plan details
obclient> SELECT operator, name, rows, cost
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 1002 AND plan_id = 741
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

Find Currently Running Slow Queries

MySQL Mode:

obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
                svr_ip, svr_port, trace_id
         FROM oceanbase.GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;

Oracle Mode:

obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
                svr_ip, svr_port, trace_id
         FROM GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;

Real-World Optimization Case

Case: Type Conversion Preventing Index Usage

Problem: SQL execution time was 2 seconds, CPU usage exceeded 70% of server resources.

Root Cause Analysis:

Identify TOP SQL:

obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time
         FROM oceanbase.GV$OB_SQL_AUDIT
         WHERE tenant_id = 'mysql001'
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())
         GROUP BY sql_id
         ORDER BY SUM(execute_time) DESC
         LIMIT 1;

Analyze Execution Plan:
```
obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);
```
Execution plan showed:
- High cost on TBL3(IDX_TBL3_COL170) index scan
- range(MIN,MIN ; MAX,MAX)always true - index not used for matching
- filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)]) - type conversion happening
Root Cause:
- View V_TT01 had TO_NUMBER conversion on COL001
- Table TBL3.COL170 was VARCHAR2(20) but compared with NUMBER
- Type conversion prevented index usage

Optimization Steps:

Remove unnecessary TO_NUMBER conversion:

-- Before: View had TO_NUMBER conversion
CREATE VIEW V_TT01_OLD AS
SELECT To_number("tt01"."col001") AS "COL001", ...

-- After: Remove TO_NUMBER (COL001 is already NUMBER)
CREATE VIEW V_TT01 AS
SELECT "tt01"."col001" AS "COL001", ...

Result: SQL RT reduced from 2s to 150ms

Unify data types:

-- Before: TBL3.COL170 was VARCHAR2(20)
CREATE TABLE TBL3 (
  COL170 VARCHAR2(20) NOT NULL,
  ...
);

-- After: Change to NUMBER(20) to match join condition
ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;

Result: SQL RT further reduced to 20ms

Partition and table group optimization:

-- Add hash partitioning on join keys
ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8;
ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8;
ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8;

-- Create table group to avoid cross-partition joins
CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL1 SET TABLEGROUP order_tg;
ALTER TABLE TBL2 SET TABLEGROUP order_tg;
ALTER TABLE TBL3 SET TABLEGROUP order_tg;
ALTER TABLE TT01 SET TABLEGROUP order_tg;

Result: SQL RT further reduced to 4ms

Final Execution Plan: Shows PX PARTITION ITERATOR (partition pruning working) and TABLE SCAN with index INDEX_TBL3_COL170 (index used efficiently)

Key Learnings:

Always check execution plan for type conversions in filters
Ensure join column data types match
Use partition pruning and table groups for distributed queries
Remove unnecessary type conversions in views

Common Anti-patterns

❌ Avoid These Patterns

1. N+1 Query Problem:

-- ❌ BAD: Multiple queries in loop
-- Instead, use JOIN or batch queries

2. SELECT * in Production:

-- ❌ BAD: SELECT * returns unnecessary data
SELECT * FROM order_table;

-- ✅ GOOD: Select only needed columns
SELECT order_id, customer_id, total_amount FROM order_table;

3. Functions on Indexed Columns:

-- ❌ BAD: Prevents index usage
WHERE UPPER(customer_name) = 'JOHN'

-- ✅ GOOD: Store normalized data or use function-based index
WHERE customer_name = 'JOHN'

4. Implicit Type Conversions:

-- ❌ BAD: String to number conversion
WHERE order_id = '12345'

-- ✅ GOOD: Match data types
WHERE order_id = 12345

Performance Monitoring

Key Metrics to Monitor

-- Check table statistics
obclient [SALES_DB]> ANALYZE TABLE order_table;

-- View table size and row count
obclient [SALES_DB]> SELECT 
                         table_name,
                         table_rows,
                         data_length,
                         index_length
                     FROM information_schema.tables
                     WHERE table_schema = 'SALES_DB';

-- Check index usage
obclient [SALES_DB]> SHOW INDEX FROM order_table;

Optimization Checklist

Before deploying SQL queries to production:

Quick Reference

Execution Plan Operators

Operator	Meaning	Optimization
TABLE SCAN / TABLE FULL SCAN	Full table scan	Add index or use partition key
TABLE RANGE SCAN	Range scan on table	✅ Good, but consider index
TABLE GET	Direct row access via primary key	✅ Best
HASH JOIN	Hash join algorithm	✅ Good for large equal joins
NESTED-LOOP JOIN	Nested loop join	✅ Good for small outer table with indexed inner
MERGE JOIN	Merge join algorithm	✅ Good for pre-sorted data
SORT	Sorting operation	Add ORDER BY index or use sorted index
AGGREGATE	Aggregation	Consider materialized views
PX PARTITION ITERATOR	Parallel partition iterator	✅ Good, partition pruning working
EXCHANGE OUT DISTR	Distributed exchange	Indicates distributed execution

Index Types

Primary Key: Automatically indexed, unique
Unique Index: Ensures uniqueness, fast lookups
Composite Index: Multiple columns, order matters
Covering Index: Contains all query columns