ClickHouse Best Practices
Guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 atomic rules across 3 categories (schema, query, insert), prioritized by impact. Extended with 14 reference files covering cluster management, backups, monitoring, and integrations.
Official docs: ClickHouse Best Practices
Official docs: ClickHouse Best Practices
⚠️ Security Considerations
Credential Placeholders
Example credentials in documentation (password123 , AKIAIOSFODNN7EXAMPLE ) are placeholders only. Never use these in production. Use proper secret management:
-
Environment variables
-
Secret managers (AWS Secrets Manager, HashiCorp Vault, etc.)
-
Kubernetes secrets (for K8s deployments)
-
ClickHouse named collections with external configuration
Installation & Operations
For installation and operational procedures:
-
Follow official documentation links provided in reference files
-
Prefer package managers (apt , yum , helm ) over direct downloads
-
Use versioned artifacts instead of latest in production
-
Test procedures in non-production environments first
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
-
Check for applicable rules in the rules/ directory
-
If rules exist: Apply them and cite them in your response using "Per rule-name ..."
-
If no rule exists: Check references/ for deeper topic coverage
-
If neither covers it: Use general ClickHouse knowledge or search documentation
-
Always cite your source: rule name, reference file, or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
-
rules/schema-pk-plan-before-creation.md — ORDER BY is immutable
-
rules/schema-pk-cardinality-order.md — Column ordering in keys
-
rules/schema-pk-prioritize-filters.md — Filter column inclusion
-
rules/schema-pk-filter-on-orderby.md — Query filter alignment
-
rules/schema-types-native-types.md — Proper type selection
-
rules/schema-types-minimize-bitwidth.md — Numeric type sizing
-
rules/schema-types-lowcardinality.md — LowCardinality usage
-
rules/schema-types-avoid-nullable.md — Nullable vs DEFAULT
-
rules/schema-types-enum.md — Enum for finite value sets
-
rules/schema-partition-low-cardinality.md — Partition count limits
-
rules/schema-partition-lifecycle.md — Partitioning purpose
-
rules/schema-partition-query-tradeoffs.md — Partition pruning trade-offs
-
rules/schema-partition-start-without.md — Start without partitioning
-
rules/schema-json-when-to-use.md — JSON type usage
Check for:
-
PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
-
Data types match actual data ranges
-
LowCardinality applied to appropriate string columns
-
Partition key cardinality bounded (100-1,000 values)
-
ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
-
rules/query-join-choose-algorithm.md — Algorithm selection
-
rules/query-join-use-any.md — ANY vs regular JOIN
-
rules/query-join-filter-before.md — Pre-join filtering
-
rules/query-join-consider-alternatives.md — Dictionaries/denormalization
-
rules/query-join-null-handling.md — join_use_nulls setting
-
rules/query-index-skipping-indices.md — Secondary index usage
-
rules/query-mv-incremental.md — Incremental materialized views
-
rules/query-mv-refreshable.md — Refreshable materialized views
Check for:
-
Filters use ORDER BY prefix columns
-
JOINs filter tables before joining (not after)
-
Correct JOIN algorithm for table sizes
-
Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
-
rules/insert-batch-size.md — Batch sizing requirements
-
rules/insert-async-small-batches.md — Async insert usage
-
rules/insert-format-native.md — Native format for performance
-
rules/insert-mutation-avoid-update.md — UPDATE alternatives
-
rules/insert-mutation-avoid-delete.md — DELETE alternatives
-
rules/insert-optimize-avoid-final.md — OPTIMIZE TABLE risks
Check for:
-
Batch size 10K-100K rows per INSERT
-
No ALTER TABLE UPDATE for frequent changes
-
ReplacingMergeTree or CollapsingMergeTree for update patterns
-
Async inserts enabled for high-frequency small batches
Output Format
Structure review responses as follows:
Rules Checked
rule-name-1— Compliant / Violation foundrule-name-2— Compliant / Violation found ...
Findings
Violations
rule-name: Description of the issue- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
Compliant
rule-name: Brief note on why it's correct
Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
Priority Category Impact Prefix Count
1 Primary Key Selection CRITICAL schema-pk-
4
2 Data Type Selection CRITICAL schema-types-
5
3 JOIN Optimization CRITICAL query-join-
5
4 Insert Batching CRITICAL insert-batch-
1
5 Mutation Avoidance CRITICAL insert-mutation-
2
6 Partitioning Strategy HIGH schema-partition-
4
7 Skipping Indices HIGH query-index-
1
8 Materialized Views HIGH query-mv-
2
9 Async Inserts HIGH insert-async-
2
10 OPTIMIZE Avoidance HIGH insert-optimize-
1
11 JSON Usage MEDIUM schema-json-
1
Quick Reference
Schema Design — Primary Key (CRITICAL)
-
schema-pk-plan-before-creation — Plan ORDER BY before table creation (immutable)
-
schema-pk-cardinality-order — Order columns low-to-high cardinality
-
schema-pk-prioritize-filters — Include frequently filtered columns
-
schema-pk-filter-on-orderby — Query filters must use ORDER BY prefix
Schema Design — Data Types (CRITICAL)
-
schema-types-native-types — Use native types, not String for everything
-
schema-types-minimize-bitwidth — Use smallest numeric type that fits
-
schema-types-lowcardinality — LowCardinality for <10K unique strings
-
schema-types-enum — Enum for finite value sets with validation
-
schema-types-avoid-nullable — Avoid Nullable; use DEFAULT instead
Schema Design — Partitioning (HIGH)
-
schema-partition-low-cardinality — Keep partition count 100-1,000
-
schema-partition-lifecycle — Use partitioning for data lifecycle, not queries
-
schema-partition-query-tradeoffs — Understand partition pruning trade-offs
-
schema-partition-start-without — Consider starting without partitioning
Schema Design — JSON (MEDIUM)
- schema-json-when-to-use — JSON for dynamic schemas; typed columns for known
Query Optimization — JOINs (CRITICAL)
-
query-join-choose-algorithm — Select algorithm based on table sizes
-
query-join-use-any — ANY JOIN when only one match needed
-
query-join-filter-before — Filter tables before joining
-
query-join-consider-alternatives — Dictionaries/denormalization vs JOIN
-
query-join-null-handling — join_use_nulls=0 for default values
Query Optimization — Indices (HIGH)
- query-index-skipping-indices — Skipping indices for non-ORDER BY filters
Query Optimization — Materialized Views (HIGH)
-
query-mv-incremental — Incremental MVs for real-time aggregations
-
query-mv-refreshable — Refreshable MVs for complex joins
Insert Strategy — Batching (CRITICAL)
- insert-batch-size — Batch 10K-100K rows per INSERT
Insert Strategy — Async (HIGH)
-
insert-async-small-batches — Async inserts for high-frequency small batches
-
insert-format-native — Native format for best performance
Insert Strategy — Mutations (CRITICAL)
-
insert-mutation-avoid-update — ReplacingMergeTree instead of ALTER UPDATE
-
insert-mutation-avoid-delete — Lightweight DELETE or DROP PARTITION
Insert Strategy — Optimization (HIGH)
- insert-optimize-avoid-final — Let background merges work
Quick Decision Guides
Which Table Engine?
Need to store data? ├── < 1M rows, dimension → Memory └── ≥ 1M rows → MergeTree family ├── Deduplication? → ReplacingMergeTree(version) ├── Changelog? → CollapsingMergeTree(sign) ├── Pre-aggregation? → AggregatingMergeTree() ├── Replication? → ReplicatedMergeTree(...) └── Default → MergeTree()
See references/table-engines.md for complete reference.
Common Issues & Quick Fixes
Issue Quick Fix
Too many parts OPTIMIZE TABLE table FINAL (see insert-optimize-avoid-final )
Slow query EXPLAIN SELECT ... to check index usage
Mutation stuck Check system.mutations , consider alternatives per insert-mutation-avoid-update
Replication lag Check system.replication_queue , ZooKeeper
OOM on query Increase max_memory_usage , optimize query
See references/debugging.md for detailed troubleshooting.
Deep Reference Files
For topics beyond the 28 rules, see the references/ directory:
Schema & Table Design
-
references/core-concepts.md — Architecture, data model, internals
-
references/schema-design.md — Database engines, migrations, version control
-
references/table-design.md — ORDER BY, partitioning, column selection
-
references/table-engines.md — Complete MergeTree family reference
Query & Performance
-
references/sql-reference.md — Complete SQL dialect, data types
-
references/query-optimization.md — EXPLAIN, JOINs, projections, skip indexes
-
references/advanced-features.md — Materialized views, mutations, TTL, dictionaries
Operations & Cluster
-
references/debugging.md — Query debugging, merges, mutations, replication
-
references/cluster-management.md — Distributed tables, replication, sharding
-
references/backup-restore.md — Backup strategies, disaster recovery
-
references/monitoring.md — Query monitoring, health checks, system queries
Integration & Best Practices
-
references/integrations.md — Kafka, S3, PostgreSQL, MySQL, BI tools
-
references/best-practices.md — Complete checklist and anti-patterns
-
references/external.md — Altinity KB links, official docs
-
references/system-queries.md — Ready-to-use queries for operations
Version: 1.3.0 Rules: Synced with ClickHouse/agent-skills (Apache-2.0) References: Altinity Knowledge Base (200+ articles) + ClickHouse Official Docs