dbt Incremental Model Development

Choose the right strategy. Design the unique_key carefully. Handle edge cases.

When to Use Incremental

Scenario Recommendation

Source data < 10M rows Use table (simpler, full refresh is fast)

Source data > 10M rows Consider incremental

Source data updated in place Use incremental with merge strategy

Append-only source (logs, events) Use incremental with append strategy

Partitioned warehouse data Use insert_overwrite if supported

Default to table unless you have a clear performance reason for incremental.

Critical Rules

ALWAYS test with --full-refresh first before relying on incremental logic
ALWAYS verify unique_key is truly unique in both source and target
If merge fails 3+ times, check unique_key for duplicates
Run full refresh periodically to prevent data drift

Workflow

Confirm Incremental is Needed

Check source table size

dbt show --inline "select count(*) from {{ source('schema', 'table') }}"

If count < 10 million, consider using table instead. Incremental adds complexity.

Understand the Source Data Pattern

Before choosing a strategy, answer:

Is data append-only? (new rows added, never updated)
Are existing rows updated? (need merge/upsert)
Is there a reliable timestamp? (for filtering new data)
What's the unique identifier? (for merge matching)

Check for timestamp column

dbt show --inline " select min(updated_at) as earliest, max(updated_at) as latest, count(distinct date(updated_at)) as days_of_data from {{ source('schema', 'table') }} "

Choose the Right Strategy

Strategy Use When How It Works

append

Data is append-only, no updates INSERT only, no deduplication

merge

Data can be updated MERGE/UPSERT by unique_key

delete+insert

Data updated in batches DELETE matching rows, then INSERT

insert_overwrite

Partitioned tables (BigQuery, Spark) Replace entire partitions

Default: merge is safest for most use cases.

Note: Strategy availability varies by adapter. Check the dbt incremental strategy docs for your specific warehouse.

Design the Unique Key

CRITICAL: unique_key must be truly unique in your data.

Verify uniqueness BEFORE creating model

dbt show --inline " select {{ unique_key_column }}, count() from {{ source('schema', 'table') }} group by 1 having count() > 1 limit 10 "

If duplicates exist:

Add more columns to make composite key
Add deduplication logic in model
Use delete+insert instead of merge

Write the Incremental Model

{{ config( materialized='incremental', incremental_strategy='merge', -- or append, delete+insert unique_key='id', -- MUST be unique on_schema_change='append_new_columns' -- handle new columns ) }}

select id, column_a, column_b, updated_at from {{ source('schema', 'table') }}

{% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}

Build with Full Refresh First

ALWAYS verify with full refresh before trusting incremental logic.

First run: full refresh to establish baseline

dbt build --select <model_name> --full-refresh

Verify output

dbt show --select <model_name> --limit 10 dbt show --inline "select count(*) from {{ ref('model_name') }}"

Test Incremental Logic

Run incrementally (no --full-refresh)

dbt build --select <model_name>

Verify row count changed appropriately

dbt show --inline "select count(*) from {{ ref('model_name') }}"

Handle Schema Changes

Set on_schema_change based on your needs:

Setting Behavior

ignore (default) New columns in source are ignored

append_new_columns

New columns added to target

sync_all_columns

Target schema matches source exactly

fail

Error if schema changes

Common Incremental Problems

Problem: Merge Fails with Duplicate Key

Symptom: "Cannot MERGE with duplicate values"

Cause: Multiple rows with same unique_key in source or target.

Fix:

-- Add deduplication using a CTE (cross-database compatible) with deduplicated as ( select *, row_number() over (partition by id order by updated_at desc) as rn from {{ source('schema', 'table') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %} ) select * from deduplicated where rn = 1

Problem: No Partition Pruning (Full Table Scan)

Symptom: Incremental runs take as long as full refresh.

Cause: Dynamic date filter prevents partition pruning.

Fix:

{% if is_incremental() %} -- Use static date instead of subquery for partition pruning where updated_at >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} and updated_at > (select max(updated_at) from {{ this }}) {% endif %}

Problem: Late-Arriving Data is Missed

Symptom: Some records never appear in incremental model.

Cause: Filtering by max(updated_at) misses late arrivals.

Fix: Use a lookback window with a fixed offset from current date:

{% if is_incremental() %} -- Lookback 3 days to catch late-arriving data where updated_at >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} {% endif %}

Alternatively, use a variable for the lookback period:

{% set lookback_days = 3 %}

{% if is_incremental() %} where updated_at >= {{ dbt.dateadd('day', -lookback_days, dbt.current_timestamp()) }} {% endif %}

Problem: Schema Drift Causes Errors

Symptom: "Column X not found" after source adds column.

Fix: Set on_schema_change='append_new_columns' in config.

Problem: Data Drift Over Time

Symptom: Counts diverge between incremental and full refresh.

Fix: Schedule periodic full refresh:

Weekly full refresh

dbt build --select <model_name> --full-refresh

Incremental Strategy Reference

Append (Simplest)

{{ config(materialized='incremental', incremental_strategy='append') }}

select * from {{ source('events', 'raw') }} {% if is_incremental() %} where event_timestamp > (select max(event_timestamp) from {{ this }}) {% endif %}

No unique_key needed
Fastest performance
Only use for append-only data (logs, events, immutable records)

Merge (Default)

{{ config( materialized='incremental', incremental_strategy='merge', unique_key='id' ) }}

select * from {{ source('crm', 'contacts') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}

Requires unique_key
Handles updates and inserts
Most common strategy

Delete+Insert (Batch Updates)

{{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='id' ) }}

select * from {{ source('orders', 'raw') }} {% if is_incremental() %} where order_date >= {{ dbt.dateadd('day', -7, dbt.current_timestamp()) }} {% endif %}

Deletes all matching rows first
Good for reprocessing batches
Use when merge has duplicate key issues

Insert Overwrite (Partitioned)

{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'event_date', 'data_type': 'date'} ) }}

select * from {{ source('events', 'raw') }} {% if is_incremental() %} where event_date >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} {% endif %}

Replaces entire partitions
Best for partitioned tables in BigQuery/Spark
No unique_key needed (operates on partitions)

Anti-Patterns

Using incremental for small tables (< 10M rows)
Not testing with full-refresh first
Using append strategy when data can be updated
Not verifying unique_key uniqueness
Relying on exact timestamp match without lookback
Never running full refresh (causes data drift)
Using merge with non-unique keys

Testing Checklist

Model runs with --full-refresh
Model runs incrementally (without flag)
unique_key verified as truly unique
Row counts reasonable after incremental run
Late-arriving data handled (lookback window)
Schema changes handled (on_schema_change set)
Periodic full refresh scheduled

developing-incremental-models

Safety Notice

Copy this and send it to your AI assistant to learn

Check source table size

Check for timestamp column

Verify uniqueness BEFORE creating model

First run: full refresh to establish baseline

Verify output

Run incrementally (no --full-refresh)

Verify row count changed appropriately

Weekly full refresh

Source Transparency

Related Skills

documenting-dbt-models

testing-dbt-models

optimizing-query-text