Identification Theory

Comprehensive framework for causal identification in statistical methodology

Use this skill when working on: causal identification, mediation analysis identification, DAG-based reasoning, potential outcomes, identification assumptions, partial identification, sensitivity analysis, or deriving identification formulas.

Core Concepts

What is Identification?

A causal parameter $\psi$ is identified if it can be uniquely determined from the observed data distribution $P(O)$.

Formally: $\psi$ is identified if $P_1(O) = P_2(O) \Rightarrow \psi_1 = \psi_2$.

Why Identification Matters

Causal Question → Target Estimand → Identification → Estimation → Inference ↓ ↓ ↓ ↓ ↓ "Does A E[Y(1)-Y(0)] Express in Statistical Confidence cause Y?" terms of P(O) methods intervals

Without identification, no amount of data can answer causal questions.

Two Frameworks

Potential Outcomes (Rubin/Neyman)

Primitives:

$Y(a)$ = potential outcome under treatment $a$
Only $Y = Y(A)$ is observed (consistency)
Fundamental problem: never observe both $Y(0)$ and $Y(1)$ for same unit

Advantages:

Clear definition of causal effects
Natural for experimental reasoning
Connects to missing data theory

Structural Causal Models (Pearl)

Primitives:

Directed Acyclic Graph (DAG) encoding causal structure
Structural equations: $Y := f_Y(PA_Y, U_Y)$
Interventions via do-operator: $P(Y | do(A=a))$

Advantages:

Visual representation of assumptions
Systematic identification algorithms
Clear separation of statistical and causal assumptions

DAG Framework

Directed Acyclic Graphs (DAGs)

A DAG $\mathcal{G} = (V, E)$ consists of:

Vertices $V$: Random variables
Directed edges $E$: Direct causal relationships
Acyclic: No directed cycles

Key DAG Terminology

Term Definition Notation

Parents Direct causes $PA_Y$

Children Direct effects $CH_Y$

Ancestors All causes $AN_Y$

Descendants All effects $DE_Y$

Collider Node with two incoming arrows $A \to C \leftarrow B$

Mediator Node on causal path $A \to M \to Y$

Confounder Common cause $A \leftarrow C \to Y$

DAG specification and visualization using dagitty

library(dagitty)

Define mediation DAG

mediation_dag <- dagitty(' dag { A [exposure] M [mediator] Y [outcome] X [confounder]

X -> A
X -> M
X -> Y
A -> M
A -> Y
M -> Y

} ')

Visualize

plot(mediation_dag)

Find adjustment sets

adjustmentSets(mediation_dag, exposure = "A", outcome = "Y")

Check implied conditional independencies

impliedConditionalIndependencies(mediation_dag)

D-Separation

The Core Concept

Two nodes $A$ and $B$ are d-separated by set $Z$ if every path between them is blocked.

Path Blocking Rules

Path Type Blocked by conditioning on...

Chain: $A \to M \to B$ $M$ (blocks)

Fork: $A \leftarrow C \to B$ $C$ (blocks)

Collider: $A \to C \leftarrow B$ NOT $C$ (conditioning opens!)

D-separation Formula

$$A \perp!!!\perp_{\mathcal{G}} B \mid Z \iff \text{every path } A \text{---} B \text{ is blocked by } Z$$

Check d-separation using dagitty

check_dseparation <- function(dag, x, y, z = NULL) { if (is.null(z)) { dseparated(dag, x, y) } else { dseparated(dag, x, y, z) } }

Find all d-separating sets

find_dsep_sets <- function(dag, x, y) {

All adjustment sets that d-separate x and y

adjustmentSets(dag, exposure = x, outcome = y, effect = "total") }

Verify conditional independence implications

verify_ci_implications <- function(dag, data) { implied_ci <- impliedConditionalIndependencies(dag)

results <- lapply(implied_ci, function(ci) { # Parse the CI statement vars <- strsplit(as.character(ci), " \|\| | \| ")[[1]] x <- vars[1] y <- vars[2] z <- if (length(vars) > 2) vars[3:length(vars)] else NULL

# Test with partial correlation or conditional independence test
test_result &#x3C;- test_conditional_independence(data, x, y, z)

list(statement = as.character(ci), p_value = test_result$p.value)

})

do.call(rbind, lapply(results, as.data.frame)) }

Backdoor Criterion

Definition

A set $Z$ satisfies the backdoor criterion relative to $(A, Y)$ if:

No node in $Z$ is a descendant of $A$
$Z$ blocks every path between $A$ and $Y$ that contains an arrow into $A$

Backdoor Adjustment Formula

If $Z$ satisfies the backdoor criterion: $$P(Y | do(A = a)) = \sum_z P(Y | A = a, Z = z) P(Z = z)$$

or equivalently: $$E[Y(a)] = E_Z[E[Y | A = a, Z]]$$

Front-Door Criterion

When backdoor fails but mediator is unconfounded: $$P(Y | do(A)) = \sum_m P(M = m | A) \sum_{a'} P(Y | M = m, A = a') P(A = a')$$

Check backdoor criterion

check_backdoor <- function(dag, exposure, outcome, adjustment_set) {

Using dagitty

valid_sets <- adjustmentSets(dag, exposure = exposure, outcome = outcome, type = "minimal")

Check if proposed set is valid

is_valid <- any(sapply(valid_sets, function(s) { setequal(s, adjustment_set) }))

list( is_valid = is_valid, minimal_sets = valid_sets, proposed = adjustment_set ) }

Compute backdoor-adjusted estimate

backdoor_adjustment <- function(data, outcome, exposure, adjustment) { formula_str <- paste(outcome, "~", exposure, "+", paste(adjustment, collapse = " + ")) model <- lm(as.formula(formula_str), data = data)

Standardization

predictions_a1 <- predict(model, newdata = transform(data, setNames(list(1), exposure))) predictions_a0 <- predict(model, newdata = transform(data, setNames(list(0), exposure)))

list( ate = mean(predictions_a1 - predictions_a0), se = sqrt(var(predictions_a1 - predictions_a0) / nrow(data)) ) }

Full identification analysis

analyze_identification <- function(dag, exposure, outcome) { list( adjustment_sets = adjustmentSets(dag, exposure, outcome), instrumental_sets = instrumentalVariables(dag, exposure, outcome), direct_effects = adjustmentSets(dag, exposure, outcome, effect = "direct"), implied_independencies = impliedConditionalIndependencies(dag) ) }

Framework Equivalence

For most problems, both frameworks give equivalent results: $$E[Y(a)] = E[Y | do(A=a)]$$

Choose based on context and audience.

Key Identification Assumptions

For Treatment Effects

Assumption Formal Statement Interpretation

Consistency $Y = Y(A)$ Observed outcome equals potential outcome for received treatment

Positivity $P(A=a \mid X=x) > 0$ for all $x$ with $P(X=x) > 0$ Every covariate stratum has both treated and untreated

Exchangeability $Y(a) \perp!!!\perp A \mid X$ No unmeasured confounding given $X$

SUTVA No interference, single version of treatment Units don't affect each other

For Mediation Effects

Additional assumptions required:

Assumption Formal Statement Interpretation

Cross-world exchangeability $Y(a,m) \perp!!!\perp M(a^*) \mid X$ Counterfactual mediator independent of counterfactual outcome

No $A$-$M$ interaction (optional) $Y(a,m) - Y(a',m)$ constant in $m$ Simplifies identification

Compositional $Y(a) = Y(a, M(a))$ Potential outcome composition

Standard Identification Results

Average Treatment Effect (ATE)

Target: $\psi = E[Y(1) - Y(0)]$

Under exchangeability (A1), consistency (A2), positivity (A3):

$$\psi = E\left[E[Y | A=1, X] - E[Y | A=0, X]\right]$$

Proof sketch: \begin{align} E[Y(a)] &= E[E[Y(a) | X]] && \text{(iterated expectations)}
&= E[E[Y(a) | A=a, X]] && \text{(A1: exchangeability)}
&= E[E[Y | A=a, X]] && \text{(A2: consistency)} \end{align}

Average Treatment Effect on Treated (ATT)

Target: $\psi_{ATT} = E[Y(1) - Y(0) | A=1]$

Under weaker exchangeability $Y(0) \perp!!!\perp A \mid X$:

$$\psi_{ATT} = E\left[E[Y | A=1, X] - E[Y | A=0, X] \mid A=1\right]$$

Natural Direct and Indirect Effects (Mediation)

Target:

NDE: $E[Y(1, M(0)) - Y(0, M(0))]$
NIE: $E[Y(1, M(1)) - Y(1, M(0))]$

Under mediation assumptions (see VanderWeele, 2015):

$$NDE = \int\int {E[Y|A=1,M=m,X=x] - E[Y|A=0,M=m,X=x]} , dP(m|A=0,X=x) , dP(x)$$

$$NIE = \int\int E[Y|A=1,M=m,X=x] {dP(m|A=1,X=x) - dP(m|A=0,X=x)} , dP(x)$$

Controlled Direct Effect (CDE)

Target: $CDE(m) = E[Y(1,m) - Y(0,m)]$

Simpler identification (no cross-world assumption):

$$CDE(m) = E[E[Y|A=1,M=m,X] - E[Y|A=0,M=m,X]]$$

DAG-Based Identification

The Back-Door Criterion

A set $X$ satisfies the back-door criterion relative to $(A, Y)$ if:

No node in $X$ is a descendant of $A$
$X$ blocks every path between $A$ and $Y$ that contains an arrow into $A$

If satisfied: $$P(Y | do(A=a)) = \sum_x P(Y | A=a, X=x) P(X=x)$$

The Front-Door Criterion

When there's an unmeasured confounder $U$ between $A$ and $Y$, but $M$ mediates all of $A$'s effect:

/
↓ ↓ A → M → Y

Identification: $$P(Y | do(A=a)) = \sum_m P(M=m | A=a) \sum_{a'} P(Y | M=m, A=a') P(A=a')$$

Instrumental Variables

When $Z$ affects $Y$ only through $A$:

U ↓ Z → A → Y

Local ATE identification (with monotonicity): $$LATE = \frac{E[Y | Z=1] - E[Y | Z=0]}{E[A | Z=1] - E[A | Z=0]}$$

Sequential Identification (Multiple Mediators)

Sequential Mediation (A → M1 → M2 → Y)

Product of three path identification requires:

Standard confounding control for each arrow
No intermediate confounders affected by treatment
Sequential ignorability assumptions

Path-specific effects:

Direct: $A \to Y$
Through $M_1$ only: $A \to M_1 \to Y$
Through $M_2$ only: $A \to M_2 \to Y$
Through both: $A \to M_1 \to M_2 \to Y$

Identification Formula (No Intermediate Confounding)

$$\text{Effect through } M_1 \to M_2 = \int E\left[\frac{\partial^3}{\partial a \partial m_1 \partial m_2} E[Y|A,M_1,M_2,X]\right]$$

Expressed as product of coefficients: $\hat{\alpha}_1 \cdot \hat{\beta}_1 \cdot \hat{\gamma}_2$

Partial Identification

When point identification fails, we can still bound the parameter.

Manski Bounds (No Assumptions)

For ATE with missing outcomes: $$E[Y(1)] \in [E[Y \cdot A]/P(A=1) + y_{min}P(A=0), E[Y \cdot A]/P(A=1) + y_{max}P(A=0)]$$

Sensitivity Analysis

When exchangeability is uncertain, parameterize violation:

Unmeasured confounding parameter $\Gamma$: $$\frac{1}{\Gamma} \leq \frac{P(A=1|X,U=1)/P(A=0|X,U=1)}{P(A=1|X,U=0)/P(A=0|X,U=0)} \leq \Gamma$$

Compute bounds as function of $\Gamma$ (Rosenbaum bounds).

E-Value

Minimum strength of unmeasured confounding (on risk ratio scale) needed to explain away observed effect:

$$E\text{-value} = RR + \sqrt{RR \times (RR-1)}$$

Identification Strategies by Design

Randomized Controlled Trials (RCTs)

Treatment assignment random → exchangeability holds by design
Still need SUTVA, consistency
For mediation: randomize $M$ as well, or use sequential ignorability

Observational Studies

Strategy Key Assumption Best For

Regression adjustment All confounders measured Rich covariate data

Propensity score Correct PS model High-dimensional confounders

Instrumental variables Valid instrument exists Unmeasured confounding

Regression discontinuity Continuity at threshold Sharp treatment rules

Difference-in-differences Parallel trends Panel data

Natural Experiments

Exploit exogenous variation (policy changes, geographic variation)
Requires careful argument for why variation is "as-if random"

Identification in the MediationVerse

medfit: Foundation

Implements standard mediation identification
VanderWeele regression-based approach
Supports binary/continuous treatments and mediators

probmed: Effect Size

$P_M$ identification requires identified NDE/NIE
Handles case when NDE and NIE have opposite signs

RMediation: Confidence Intervals

Takes identified effects as input
Distribution of product of coefficients (PRODCLIN)
Monte Carlo intervals

medrobust: Sensitivity

When identification assumptions are uncertain
Bounds on effects under confounding
E-values for unmeasured confounding

medsim: Validation

Simulate data where truth is known
Verify identification formulas recover true effects
Test estimator properties

Identification Proof Template

\begin{theorem}[Identification of $\psi$] Under Assumptions: \begin{enumerate}[label=A\arabic*.] \item (Consistency) $Y = Y(A)$, $M = M(A)$ \item (Positivity) $P(A=a|X) > \epsilon > 0$ for all $a \in \mathcal{A}$ \item (Exchangeability) $Y(a) \perp!!!\perp A \mid X$ \end{enumerate} the causal estimand $\psi = E[g(Y(a))]$ is identified by [ \psi = E_X\left[E[g(Y) \mid A=a, X]\right]. ] \end{theorem}

\begin{proof} \begin{align} E[g(Y(a))] &= E\left[E[g(Y(a)) \mid X]\right] && \text{(law of total expectation)} \ &= E\left[E[g(Y(a)) \mid A=a, X]\right] && \text{(by A3: exchangeability)} \ &= E\left[E[g(Y) \mid A=a, X]\right] && \text{(by A1: consistency)} \end{align} The RHS depends only on the observed data distribution $P(Y,A,X)$. \end{proof}

Common Identification Pitfalls

Conditioning on Colliders

A → C ← Y

Conditioning on $C$ opens a path between $A$ and $Y$.

Conditioning on Mediators

A → M → Y

Conditioning on $M$ blocks the indirect effect, doesn't control confounding.

Overcontrol Bias

Conditioning on descendants of treatment can bias estimates.

M-Bias

U1 → X ← U2 ↓ ↓ A ——————→ Y

Conditioning on $X$ opens path $A \leftarrow U_1 \rightarrow X \leftarrow U_2 \rightarrow Y$.

Table 2 Fallacy

Interpreting coefficients causally when model includes intermediate variables.

Verification Questions

When reviewing identification arguments, ask:

Is the target estimand clearly defined?
Are all assumptions explicitly stated?
Is each step in the derivation justified?
Are the assumptions plausible in this context?
What if an assumption is violated?
Is there a DAG that encodes the assumptions?
Are there alternative identification strategies?

Integration with Other Skills

This skill works with:

proof-architect - For writing identification proofs
asymptotic-theory - For inference after identification
methods-paper-writer - For presenting identification in manuscripts
simulation-architect - For validating identification

Key References

Imai

Hernan

Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.)

VanderWeele, T.J. (2015). Explanation in Causal Inference

Hernán, M.A. & Robins, J.M. (2020). Causal Inference: What If

Imbens, G.W. & Rubin, D.B. (2015). Causal Inference for Statistics

Version: 1.0 Created: 2025-12-08 Domain: Causal Inference, Mediation Analysis

identification-theory

Safety Notice

Copy this and send it to your AI assistant to learn