dbt Migration
Don't convert everything at once. Build and validate layer by layer.
Workflow
- Analyze Legacy SQL
cat <legacy_sql_file>
Identify all tables referenced in the query.
- Check What Already Exists
Search for existing models/sources that reference the table
grep -r "<table_name>" models/ --include=".sql" --include=".yml" find models/ -name "*.sql" | xargs grep -l "<table_name>"
For each table referenced in the legacy SQL:
-
Check if an existing model already references this table
-
Check if a source definition exists
-
If neither exists, ask user: "Table X not found - should I create it as a source?"
Only proceed to intermediate/mart layers after all dependencies exist.
- Create Missing Sources
models/staging/sources.yml
version: 2
sources:
- name: raw_database
schema: raw_schema
tables:
- name: orders description: Raw orders from source system
- name: customers description: Raw customer records
- Build Staging Layer
One staging model per source table. Follow existing project naming conventions.
Build before proceeding:
dbt build --select <staging_model>
- Build Intermediate Layer (if needed)
Extract complex joins/logic into intermediate models.
Build incrementally:
dbt build --select <intermediate_model>
- Build Mart Layer
Final business-facing model with aggregations.
- Validate Migration
Build entire lineage
dbt build --select +<final_model> dbt show --select <final_model>
Migration Checklist
-
All source tables identified and documented
-
Sources.yml created with descriptions
-
Staging models: 1:1 with sources, renamed columns
-
Intermediate models: business logic extracted
-
Mart models: final aggregations
-
Each layer compiles successfully
-
Each layer builds successfully
-
Row counts match original (manual validation)
-
Tests added for key constraints
Common Migration Patterns
-
Nested subqueries → Separate models (staging → intermediate → mart)
-
Temp tables → Ephemeral materialization {{ config(materialized='ephemeral') }}
-
Hardcoded values → Variables {{ var("name") }}
Anti-Patterns
-
Converting entire legacy query to single dbt model
-
Skipping the staging layer
-
Not validating each layer before proceeding
-
Keeping hardcoded values instead of using variables
-
Not documenting business logic during migration