PostgreSQL Optimization
Overview
Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.
When to Use
-
Diagnosing slow queries with EXPLAIN ANALYZE
-
Creating optimal indexes for query patterns
-
Designing database schemas for performance
-
Configuring PostgreSQL for production workloads
-
Implementing connection pooling (PgBouncer, Supavisor)
-
Setting up partitioning for large tables
-
Analyzing and reducing lock contention
-
Migrating or upgrading PostgreSQL versions
Capabilities
Query Optimization
-
EXPLAIN / EXPLAIN ANALYZE interpretation
-
Query plan analysis and optimization
-
Identifying sequential scans vs index scans
-
Join optimization and query rewriting
-
CTE vs subquery performance trade-offs
-
Window function optimization
Indexing Strategies
-
B-tree, GIN, GiST, BRIN index selection
-
Partial indexes for filtered queries
-
Expression indexes for computed values
-
Covering indexes (INCLUDE clause)
-
Index-only scans optimization
-
Concurrent index creation
Schema Design
-
Normalization vs denormalization trade-offs
-
JSONB column design and indexing
-
Array columns and operations
-
Enum types vs lookup tables
-
Foreign key cascade strategies
-
Table inheritance and partitioning
Configuration Tuning
-
Memory settings (shared_buffers, work_mem, effective_cache_size)
-
Connection limits and pooling
-
WAL and checkpoint tuning
-
Autovacuum configuration
-
Statistics collection settings
Advanced Features
-
Partitioning (range, list, hash)
-
Materialized views with refresh strategies
-
Full-text search with tsvector/tsquery
-
PostGIS geospatial queries
-
Logical replication setup
-
pg_stat_statements analysis
Dependencies
Works well with:
-
database-modeler
-
Schema design and ERD creation
-
data-pipeline-engineer
-
ETL and data processing
-
site-reliability-engineer
-
Database monitoring and alerting
-
nextjs-app-router-expert
-
Full-stack data fetching
Examples
Reading EXPLAIN ANALYZE Output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > '2024-01-01' GROUP BY u.id;
-- Key metrics to look for: -- - "Seq Scan" on large tables → needs index -- - "Rows Removed by Filter" high → filter before join -- - "Sort Method: external merge" → increase work_mem -- - "Buffers: shared hit" vs "shared read" → cache efficiency
Creating Effective Indexes
-- Basic B-tree for equality and range queries CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders (user_id, created_at DESC);
-- Partial index for common filter CREATE INDEX CONCURRENTLY idx_orders_pending ON orders (created_at) WHERE status = 'pending';
-- GIN index for JSONB containment queries CREATE INDEX CONCURRENTLY idx_products_metadata ON products USING GIN (metadata jsonb_path_ops);
-- Covering index to enable index-only scans CREATE INDEX CONCURRENTLY idx_users_email_covering ON users (email) INCLUDE (name, created_at);
-- Expression index for case-insensitive search CREATE INDEX CONCURRENTLY idx_users_email_lower ON users (LOWER(email));
Optimizing N+1 Queries
-- BAD: N+1 pattern (1 + N queries) SELECT * FROM posts WHERE user_id = $1; -- Then for each post: SELECT * FROM comments WHERE post_id = $1;
-- GOOD: Single query with lateral join SELECT p., c.comments FROM posts p LEFT JOIN LATERAL ( SELECT json_agg(c.) as comments FROM comments c WHERE c.post_id = p.id ) c ON true WHERE p.user_id = $1;
-- GOOD: Window function for aggregates SELECT p., COUNT() OVER (PARTITION BY p.user_id) as user_post_count FROM posts p WHERE p.user_id = $1;
Table Partitioning
-- Create partitioned table by date range CREATE TABLE events ( id BIGSERIAL, event_type TEXT NOT NULL, payload JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (id, created_at) ) PARTITION BY RANGE (created_at);
-- Create monthly partitions CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Automate partition creation with pg_partman CREATE EXTENSION pg_partman; SELECT partman.create_parent('public.events', 'created_at', 'native', 'monthly');
Connection Pooling Config (PgBouncer)
; pgbouncer.ini
[databases] myapp = host=localhost dbname=myapp
[pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt
; Pool settings pool_mode = transaction ; Recommended for most apps max_client_conn = 1000 default_pool_size = 20 reserve_pool_size = 5
; Timeouts server_idle_timeout = 600 client_idle_timeout = 0
Performance Configuration
-- Check current settings SHOW shared_buffers; -- ~25% of RAM SHOW effective_cache_size; -- ~75% of RAM SHOW work_mem; -- Per-operation, start small (64MB) SHOW maintenance_work_mem; -- For VACUUM, CREATE INDEX (512MB-1GB)
-- Recommended production settings (for 32GB RAM server) ALTER SYSTEM SET shared_buffers = '8GB'; ALTER SYSTEM SET effective_cache_size = '24GB'; ALTER SYSTEM SET work_mem = '64MB'; ALTER SYSTEM SET maintenance_work_mem = '1GB'; ALTER SYSTEM SET random_page_cost = 1.1; -- For SSD storage ALTER SYSTEM SET effective_io_concurrency = 200; -- For SSD
-- Reload configuration SELECT pg_reload_conf();
Finding Slow Queries
-- Enable pg_stat_statements CREATE EXTENSION pg_stat_statements;
-- Top 10 slowest queries by total time SELECT round(total_exec_time::numeric, 2) as total_ms, calls, round(mean_exec_time::numeric, 2) as avg_ms, round((100 * total_exec_time / sum(total_exec_time) OVER())::numeric, 2) as pct, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;
-- Queries with most I/O SELECT round(shared_blks_read::numeric, 2) as disk_reads, round(shared_blks_hit::numeric, 2) as cache_hits, round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_ratio, query FROM pg_stat_statements ORDER BY shared_blks_read DESC LIMIT 10;
Analyzing Table Bloat
-- Check table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size, pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size, n_dead_tup, n_live_tup, round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;
-- Manual VACUUM for critical tables VACUUM (VERBOSE, ANALYZE) orders;
-- Reclaim space (requires exclusive lock) VACUUM FULL orders; -- Use during maintenance window
Best Practices
-
Always use EXPLAIN ANALYZE - Don't guess, measure actual query performance
-
Create indexes CONCURRENTLY - Avoid blocking writes during index creation
-
Partial indexes for hot paths - Index only the rows you query frequently
-
Use connection pooling - PgBouncer or Supavisor for production
-
Monitor pg_stat_statements - Track query performance over time
-
Regular ANALYZE - Keep statistics current for query planner
-
**Avoid SELECT *** - Only fetch columns you need
-
Batch large updates - Process in chunks to avoid lock contention
-
Use prepared statements - Reduce parsing overhead for repeated queries
Common Pitfalls
-
Missing indexes - Check for sequential scans on large tables
-
Over-indexing - Too many indexes slow down writes
-
work_mem too low - Causes disk-based sorts and hash joins
-
Connection exhaustion - Not using connection pooling
-
Stale statistics - Autovacuum not running frequently enough
-
Bloated tables - Not vacuuming after large deletes/updates
-
N+1 queries - Fetching related data in loops instead of joins
-
SELECT * everywhere - Fetching unnecessary columns