asymptotic-theory

Rigorous framework for statistical inference and efficiency in modern methodology

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "asymptotic-theory" with this command: npx skills add data-wise/claude-plugins/data-wise-claude-plugins-asymptotic-theory

Asymptotic Theory

Rigorous framework for statistical inference and efficiency in modern methodology

Use this skill when working on: asymptotic properties of estimators, influence functions, semiparametric efficiency, double robustness, variance estimation, confidence intervals, hypothesis testing, M-estimation, or deriving limiting distributions.

Efficiency Bounds

Semiparametric Efficiency Theory

Cramér-Rao Lower Bound: For any unbiased estimator, $$\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}$$

where $I(\theta)$ is the Fisher information.

Semiparametric Efficiency Bound: The variance of the efficient influence function: $$V_{eff} = E[\phi^*(\theta_0)^2]$$

where $\phi^*$ is the efficient influence function (EIF).

Influence Function Notation: $IF(O; \theta, P)$ represents the influence of observation $O$ on parameter $\theta$ under distribution $P$: $$IF(O; \theta, P) = \lim_{\epsilon \to 0} \frac{T((1-\epsilon)P + \epsilon \delta_O) - T(P)}{\epsilon}$$

Semiparametric Variance: For RAL estimators, $$\sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, E[IF(O)^2])$$

Estimating Equations: M-estimators solve $\sum_{i=1}^n \psi(O_i; \theta) = 0$, with asymptotic variance: $$V = \left(\frac{\partial}{\partial \theta} E[\psi(O; \theta)]\right)^{-1} E[\psi(O; \theta)\psi(O; \theta)^T] \left(\frac{\partial}{\partial \theta} E[\psi(O; \theta)]\right)^{-T}$$

Efficiency for Mediation Estimands

Estimand Efficient Influence Function Efficiency Bound

ATE $\phi_{ATE} = \frac{A}{\pi}(Y-\mu_1) - \frac{1-A}{1-\pi}(Y-\mu_0) + \mu_1 - \mu_0 - \psi$ $V_{ATE} = E[\phi_{ATE}^2]$

NDE Complex (VanderWeele & Tchetgen, 2014) Higher than ATE

NIE Complex (VanderWeele & Tchetgen, 2014) Higher than ATE

Compute semiparametric efficiency bound

compute_efficiency_bound <- function(data, estimand = "ATE") { n <- nrow(data)

if (estimand == "ATE") { # Estimate nuisance functions ps_model <- glm(A ~ X, data = data, family = binomial) pi_hat <- predict(ps_model, type = "response")

mu1_model &#x3C;- lm(Y ~ X, data = subset(data, A == 1))
mu0_model &#x3C;- lm(Y ~ X, data = subset(data, A == 0))

mu1_hat &#x3C;- predict(mu1_model, newdata = data)
mu0_hat &#x3C;- predict(mu0_model, newdata = data)

# Efficient influence function
psi_hat &#x3C;- mean(mu1_hat - mu0_hat)
phi &#x3C;- with(data, {
  A/pi_hat * (Y - mu1_hat) -
  (1-A)/(1-pi_hat) * (Y - mu0_hat) +
  mu1_hat - mu0_hat - psi_hat
})

# Efficiency bound = variance of EIF
list(
  efficiency_bound = var(phi),
  standard_error = sqrt(var(phi) / n),
  eif_values = phi
)

} }

Empirical Process Theory

Key Concepts

Empirical Process: $\mathbb{G}_n(f) = \sqrt{n}(\mathbb{P}n - P)f = \frac{1}{\sqrt{n}}\sum{i=1}^n (f(O_i) - Pf)$

Uniform Convergence: For function class $\mathcal{F}$, $$\sup_{f \in \mathcal{F}} |\mathbb{G}n(f)| \xrightarrow{d} \sup{f \in \mathcal{F}} |\mathbb{G}(f)|$$

where $\mathbb{G}$ is a Gaussian process.

Complexity Measures

Measure Definition Use

VC dimension Max shattered set size Classification

Covering number $N(\epsilon, \mathcal{F}, |\cdot|)$ General classes

Bracketing number $N_{[]}(\epsilon, \mathcal{F}, L_2)$ Entropy bounds

Rademacher complexity $\mathcal{R}n(\mathcal{F}) = E[\sup{f \in \mathcal{F}} \frac{1}{n}\sum_i \epsilon_i f(X_i)

Estimate Rademacher complexity via Monte Carlo

estimate_rademacher <- function(f_class, data, n_reps = 1000) { n <- nrow(data)

sup_values <- replicate(n_reps, { # Random Rademacher variables epsilon <- sample(c(-1, 1), n, replace = TRUE)

# Compute supremum over function class
sup_f &#x3C;- max(sapply(f_class, function(f) {
  abs(mean(epsilon * f(data)))
}))

sup_f

})

mean(sup_values) }

Donsker Classes

Definition and Importance

A function class $\mathcal{F}$ is Donsker if $\mathbb{G}_n \rightsquigarrow \mathbb{G}$ in $\ell^\infty(\mathcal{F})$, where $\mathbb{G}$ is a tight Gaussian process.

Key Donsker Classes

Class Description Application

VC classes Finite VC dimension Classification functions

Smooth functions Bounded derivatives Regression estimators

Monotone functions Single crossings Distribution functions

Lipschitz functions Bounded variation M-estimators

Donsker Theorem Applications

For M-estimation: If $\psi(O, \theta)$ belongs to a Donsker class, then $$\sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, V)$$

where $V = (\partial_\theta E[\psi])^{-1} \text{Var}(\psi) (\partial_\theta E[\psi])^{-T}$

Verify Donsker conditions for empirical process

check_donsker_conditions <- function(psi_class, data) {

Estimate bracketing entropy integral

epsilon_grid <- seq(0.01, 1, by = 0.01) bracket_numbers <- sapply(epsilon_grid, function(eps) { # Estimate N_[](eps, F, L_2) estimate_bracketing_number(psi_class, data, eps) })

Donsker if integral converges

entropy_integral <- integrate( function(eps) sqrt(log(approxfun(epsilon_grid, bracket_numbers)(eps))), lower = 0, upper = 1 )

list( is_donsker = entropy_integral$value < Inf, entropy_integral = entropy_integral$value, bracket_numbers = data.frame(epsilon = epsilon_grid, N = bracket_numbers) ) }

Core Concepts

Why Asymptotics?

  • Exact distributions often unavailable for complex estimators

  • Large-sample approximations provide tractable inference

  • Efficiency theory guides optimal estimator construction

  • Robustness properties clarified through asymptotic analysis

Fundamental Sequence

Estimator θ̂ₙ → Consistency → Asymptotic Normality → Efficiency → Inference ↓ ↓ ↓ ↓ θ̂ₙ →ᵖ θ₀ √n(θ̂ₙ-θ₀) →ᵈ N(0,V) V = V_eff CIs, tests

Modes of Convergence

Convergence in Probability ($\xrightarrow{p}$)

$X_n \xrightarrow{p} X$ if $\forall \epsilon > 0$: $P(|X_n - X| > \epsilon) \to 0$

Consistency: $\hat{\theta}_n \xrightarrow{p} \theta_0$

Convergence in Distribution ($\xrightarrow{d}$)

$X_n \xrightarrow{d} X$ if $F_{X_n}(x) \to F_X(x)$ at all continuity points

Asymptotic normality: $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, V)$

Almost Sure Convergence ($\xrightarrow{a.s.}$)

$X_n \xrightarrow{a.s.} X$ if $P(\lim_{n\to\infty} X_n = X) = 1$

Relationship: $\xrightarrow{a.s.} \Rightarrow \xrightarrow{p} \Rightarrow \xrightarrow{d}$

Stochastic Order Notation

Notation Meaning Example

$O_p(1)$ Bounded in probability $\hat{\theta}_n = O_p(1)$

$o_p(1)$ Converges to 0 in probability $\hat{\theta}_n - \theta_0 = o_p(1)$

$O_p(a_n)$ $X_n/a_n = O_p(1)$ $\hat{\theta}_n - \theta_0 = O_p(n^{-1/2})$

$o_p(a_n)$ $X_n/a_n = o_p(1)$ Remainder terms

Key Theorems

Laws of Large Numbers

Weak LLN: If $X_1, \ldots, X_n$ iid with $E|X| < \infty$: $$\bar{X}_n \xrightarrow{p} E[X]$$

Strong LLN: If $X_1, \ldots, X_n$ iid with $E|X| < \infty$: $$\bar{X}_n \xrightarrow{a.s.} E[X]$$

Uniform LLN: For $\sup_{\theta \in \Theta}$ convergence, need additional conditions (compactness, envelope).

Central Limit Theorem

Classical CLT: If $X_1, \ldots, X_n$ iid with $E[X] = \mu$, $Var(X) = \sigma^2 < \infty$: $$\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)$$

Lindeberg-Feller CLT: For triangular arrays with: $$\sum_{i=1}^n E[X_{ni}^2 \mathbf{1}(|X_{ni}| > \epsilon)] \to 0 \quad \forall \epsilon > 0$$

Multivariate CLT: $$\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \Sigma)$$

Slutsky's Theorem

If $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{p} c$ (constant):

  • $X_n + Y_n \xrightarrow{d} X + c$

  • $X_n Y_n \xrightarrow{d} cX$

  • $X_n/Y_n \xrightarrow{d} X/c$ (if $c \neq 0$)

Continuous Mapping Theorem

If $X_n \xrightarrow{d} X$ and $g$ continuous: $$g(X_n) \xrightarrow{d} g(X)$$

Delta Method

If $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, V)$ and $g$ differentiable at $\theta_0$: $$\sqrt{n}(g(\hat{\theta}_n) - g(\theta_0)) \xrightarrow{d} N(0, g'(\theta_0)^\top V g'(\theta_0))$$

Multivariate: Replace $g'(\theta_0)$ with Jacobian matrix.

M-Estimation Theory

Setup

Estimator $\hat{\theta}_n$ solves: $$\hat{\theta}n = \arg\max{\theta \in \Theta} M_n(\theta)$$

where $M_n(\theta) = n^{-1} \sum_{i=1}^n m(O_i; \theta)$

Consistency Conditions

  • Uniform convergence: $\sup_\theta |M_n(\theta) - M(\theta)| \xrightarrow{p} 0$

  • Identification: $M(\theta)$ uniquely maximized at $\theta_0$

  • Compactness: $\Theta$ compact (or identification at distance from boundary)

Result: $\hat{\theta}_n \xrightarrow{p} \theta_0$

Asymptotic Normality Conditions

  • $\theta_0$ interior point of $\Theta$

  • $M(\theta)$ twice differentiable at $\theta_0$

  • $\ddot{M}(\theta_0)$ non-singular

  • $\sqrt{n} \dot{M}_n(\theta_0) \xrightarrow{d} N(0, V)$

Result: $$\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, [-\ddot{M}(\theta_0)]^{-1} V [-\ddot{M}(\theta_0)]^{-1})$$

Standard Errors

Sandwich estimator: $$\hat{V} = \hat{A}^{-1} \hat{B} \hat{A}^{-1}$$

where:

  • $\hat{A} = -n^{-1} \sum_i \ddot{m}(O_i; \hat{\theta}_n)$ (Hessian)

  • $\hat{B} = n^{-1} \sum_i \dot{m}(O_i; \hat{\theta}_n) \dot{m}(O_i; \hat{\theta}_n)^\top$ (outer product)

Influence Functions

Definition

The influence function of a functional $T(P)$ at distribution $P$ is: $$\phi(o) = \lim_{\epsilon \to 0} \frac{T((1-\epsilon)P + \epsilon \delta_o) - T(P)}{\epsilon}$$

where $\delta_o$ is point mass at $o$.

Properties

  • Mean zero: $E_P[\phi(O)] = 0$

  • Variance = asymptotic variance: If $\sqrt{n}(\hat{T}_n - T) \xrightarrow{d} N(0, V)$, then $V = E[\phi(O)^2]$

  • Linearization: $\sqrt{n}(\hat{T}_n - T) = \sqrt{n} \mathbb{P}_n[\phi] + o_p(1)$

Examples

Functional Influence Function

Mean $E[Y]$ $\phi(y) = y - E[Y]$

Variance $Var(Y)$ $\phi(y) = (y - \mu)^2 - \sigma^2$

Quantile $Q_p$ $\phi(y) = \frac{p - \mathbf{1}(y \leq Q_p)}{f(Q_p)}$

Regression coefficient $\phi = (X^\top X)^{-1} X(Y - X^\top\beta)$

Deriving Influence Functions

Method 1: Gateaux derivative (definition)

Method 2: Estimating equation approach If $\hat{\theta}$ solves $\mathbb{P}n[\psi(O; \theta)] = 0$, then: $$\phi(O) = -E[\partial\theta \psi]^{-1} \psi(O; \theta_0)$$

Method 3: Functional delta method For $\psi = g(T_1, T_2, \ldots)$: $$\phi_\psi = \sum_j \frac{\partial g}{\partial T_j} \phi_{T_j}$$

Semiparametric Efficiency

Semiparametric Models

Model $\mathcal{P}$ contains distributions satisfying: $$\theta = \Psi(P), \quad P \in \mathcal{P}$$

The "nuisance" is infinite-dimensional (e.g., unknown baseline distribution).

Tangent Space

Parametric submodels: One-dimensional smooth paths ${P_t : t \in \mathbb{R}}$ through $P_0$.

Score: $S = \partial_t \log p_t \big|_{t=0}$

Tangent space $\mathcal{T}$: Closed linear span of all such scores.

Efficiency Bound

The efficient influence function (EIF) is the projection of any influence function onto the tangent space.

Semiparametric efficiency bound: $$V_{eff} = E[\phi_{eff}(O)^2]$$

No regular estimator can have asymptotic variance smaller than $V_{eff}$.

Achieving Efficiency

An estimator is semiparametrically efficient if its influence function equals the EIF: $$\phi_{\hat{\theta}} = \phi_{eff}$$

Strategies:

  • Solve efficient score equation

  • Targeted learning (TMLE)

  • One-step estimator with EIF-based correction

Double Robustness

Concept

An estimator is doubly robust if it is consistent when either:

  • Outcome model correctly specified, OR

  • Treatment model (propensity score) correctly specified

AIPW Estimator

For ATE $\psi = E[Y(1) - Y(0)]$:

$$\hat{\psi}_{DR} = \mathbb{P}_n\left[\frac{A(Y - \hat{\mu}_1(X))}{\hat{\pi}(X)} + \hat{\mu}_1(X)\right] - \mathbb{P}_n\left[\frac{(1-A)(Y - \hat{\mu}_0(X))}{1-\hat{\pi}(X)} + \hat{\mu}_0(X)\right]$$

where:

  • $\hat{\mu}_a(X) = \hat{E}[Y|A=a,X]$ (outcome model)

  • $\hat{\pi}(X) = \hat{P}(A=1|X)$ (propensity score)

Why It Works

Bias decomposition: $$\hat{\psi}_{DR} - \psi = \text{(outcome error)} \times \text{(propensity error)} + o_p(n^{-1/2})$$

If either error is zero, bias is zero.

Efficiency Under Double Robustness

When both models correct:

  • Achieves semiparametric efficiency bound

  • Asymptotic variance = $E[\phi_{eff}^2]$

When one model wrong:

  • Still consistent

  • But less efficient than when both correct

Variance Estimation

Analytic (Sandwich)

$$\hat{V} = \frac{1}{n} \sum_{i=1}^n \hat{\phi}(O_i)^2$$

where $\hat{\phi}$ is estimated influence function.

Bootstrap

Nonparametric bootstrap:

  • Resample $n$ observations with replacement

  • Compute $\hat{\theta}^*_b$ for $b = 1, \ldots, B$

  • $\hat{V} = \text{Var}(\hat{\theta}^_1, \ldots, \hat{\theta}^_B)$

Bootstrap validity: Requires $\sqrt{n}$-consistent, regular estimators.

Influence Function-Based Bootstrap

More stable than full recomputation: $$\hat{\theta}^b = \hat{\theta} + n^{-1} \sum{i=1}^n (W_i^ - 1) \hat{\phi}(O_i)$$

where $W_i^*$ are bootstrap weights.

Inference

Confidence Intervals

Wald interval: $$\hat{\theta} \pm z_{1-\alpha/2} \cdot \hat{SE}$$

Percentile bootstrap: $$[\hat{\theta}^{(\alpha/2)}, \hat{\theta}^{(1-\alpha/2)}]$$

BCa bootstrap (bias-corrected accelerated): Corrects for bias and skewness.

Hypothesis Testing

Wald test: $W = (\hat{\theta} - \theta_0)^2 / \hat{V} \sim \chi^2_1$

Score test: Based on score at null.

Likelihood ratio test: $2(\ell(\hat{\theta}) - \ell(\theta_0)) \sim \chi^2_k$

Product of Coefficients (Mediation)

Setup

Mediation effect = $\alpha \beta$ (or $\alpha_1 \beta_1 \gamma_2$ for sequential)

Distribution of Products

Not normal: Product of normals is NOT normal.

Exact distribution: Complex (involves Bessel functions for two normals).

Approximations:

  • Sobel test: Normal approximation via delta method

  • PRODCLIN: Distribution of product method (RMediation)

  • Monte Carlo: Simulate from joint distribution

Delta Method Variance

For $\psi = \alpha\beta$: $$Var(\hat{\alpha}\hat{\beta}) \approx \beta^2 Var(\hat{\alpha}) + \alpha^2 Var(\hat{\beta}) + Var(\hat{\alpha})Var(\hat{\beta})$$

The last term often omitted (Sobel) but matters when effects are small.

Product of Three

For sequential mediation $\psi = \alpha_1 \beta_1 \gamma_2$:

  • Distribution more complex

  • Monte Carlo or specialized methods needed

  • Your "product of three" manuscript addresses this

Regularity Conditions Checklist

For Consistency

  • Parameter space compact (or bounded away from boundary)

  • Objective function continuous in $\theta$

  • Uniform convergence of criterion

  • Unique maximizer at $\theta_0$

For Asymptotic Normality

  • $\theta_0$ interior point

  • Twice differentiable criterion

  • Non-singular Hessian

  • CLT applies to score

  • Lindeberg/Lyapunov conditions if non-iid

For Efficiency

  • Model correctly specified

  • Nuisance parameters consistently estimated

  • Sufficient smoothness for influence function calculation

  • Rate conditions on nuisance estimation (for doubly robust)

Common Pitfalls

  1. Ignoring Estimation of Nuisance Parameters

Wrong: Treat $\hat{\eta}$ as known when computing variance. Right: Account for $\hat{\eta}$ uncertainty or use cross-fitting.

  1. Slow Nuisance Estimation

For doubly robust estimators, need: $$|\hat{\mu} - \mu_0| \cdot |\hat{\pi} - \pi_0| = o_p(n^{-1/2})$$

If both converge at $n^{-1/4}$, product is $n^{-1/2}$.

  1. Bootstrap Failure

Bootstrap can fail for:

  • Non-differentiable functionals

  • Super-efficient estimators

  • Boundary parameters

  1. Underestimating Variance

Sandwich estimator assumes correct influence function. Model misspecification → wrong variance.

Template: Asymptotic Result

\begin{theorem}[Asymptotic Distribution] Under Assumptions \ref{A1}--\ref{An}: \begin{enumerate} \item (Consistency) $\hat{\theta}_n \xrightarrow{p} \theta_0$ \item (Asymptotic normality) $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, V)$ \item (Variance) $V = E[\phi(O)^2]$ where $\phi$ is the influence function \item (Variance estimation) $\hat{V} \xrightarrow{p} V$ \end{enumerate} \end{theorem}

\begin{proof} \textbf{Step 1 (Consistency):} [Apply M-estimation or direct argument]

\textbf{Step 2 (Expansion):} Taylor expand around $\theta_0$: [ 0 = \mathbb{P}_n[\psi(O; \hat{\theta})] = \mathbb{P}_n[\psi(O; \theta_0)] + \mathbb{P}_n[\dot{\psi}(\tilde{\theta})](\hat{\theta} - \theta_0) ]

\textbf{Step 3 (Rearrangement):} [ \sqrt{n}(\hat{\theta} - \theta_0) = -[\mathbb{P}_n[\dot{\psi}]]^{-1} \sqrt{n}\mathbb{P}_n[\psi(O; \theta_0)] ]

\textbf{Step 4 (CLT):} $\sqrt{n}\mathbb{P}_n[\psi(O; \theta_0)] \xrightarrow{d} N(0, E[\psi\psi^\top])$ by CLT.

\textbf{Step 5 (Slutsky):} $\mathbb{P}_n[\dot{\psi}] \xrightarrow{p} E[\dot{\psi}]$ by WLLN. Apply Slutsky.

\textbf{Step 6 (Identify $V$):} $V = E[\dot{\psi}]^{-1} E[\psi\psi^\top] E[\dot{\psi}]^{-\top}$. \end{proof}

Integration with Other Skills

This skill works with:

  • proof-architect - For structuring asymptotic proofs

  • identification-theory - Identification precedes estimation/inference

  • simulation-architect - Validate asymptotic approximations

  • methods-paper-writer - Present results in manuscripts

Key References

Bickel

Newey

Robins

van der Vaart, A.W. (1998). Asymptotic Statistics

Tsiatis, A.A. (2006). Semiparametric Theory and Missing Data

Kennedy, E.H. (2016). Semiparametric Theory and Empirical Processes

van der Laan, M.J. & Rose, S. (2011). Targeted Learning

Version: 1.0 Created: 2025-12-08 Domain: Asymptotic Statistics, Semiparametric Inference

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

proof-architect

No summary provided by upstream source.

Repository SourceNeeds Review
General

numerical-methods

No summary provided by upstream source.

Repository SourceNeeds Review
General

literature-gap-finder

No summary provided by upstream source.

Repository SourceNeeds Review
General

identification-theory

No summary provided by upstream source.

Repository SourceNeeds Review