Proof Architect
Structured methodology for constructing and verifying mathematical proofs in statistical research
Use this skill when working on: mathematical proofs, theorem development, derivations, consistency proofs, asymptotic arguments, identification proofs, or verifying proof correctness.
Proof Structure Framework
Standard Proof Components
Every rigorous statistical proof should contain:
-
Claim Statement - Precise mathematical statement of what is being proved
-
Assumptions - All conditions required (clearly enumerated A1, A2, ...)
-
Notation - Define all symbols before use
-
Proof Body - Logical sequence of justified steps
-
Conclusion - Explicit statement that claim is established
Proof Skeleton Template
\begin{theorem}[Name] \label{thm:name} Under Assumptions \ref{A1}--\ref{An}, [precise claim]. \end{theorem}
\begin{proof} The proof proceeds in [n] steps.
\textbf{Step 1: [Description]} [Content with justification for each transition]
\textbf{Step 2: [Description]} [Content]
\vdots
\textbf{Step n: Conclusion} Combining Steps 1--[n-1], we obtain [result], completing the proof. \end{proof}
Proof Types in Statistical Methodology
- Identification Proofs
Goal: Show that a causal/statistical quantity is uniquely determined from observed data distribution.
Standard Structure:
-
Define target estimand (e.g., $\psi = E[Y(a)]$)
-
State identifying assumptions (consistency, positivity, exchangeability)
-
Apply identification formula derivation
-
Show formula depends only on observable quantities
Template:
\begin{theorem}[Identification of $\psi$] Under Assumptions \ref{A:consistency}--\ref{A:positivity}, the causal effect $\psi = E[Y(a)]$ is identified by [ \psi = \int E[Y \mid A=a, X=x] , dP(x). ] \end{theorem}
\begin{proof} \begin{align} E[Y(a)] &= E[E[Y(a) \mid X]] && \text{(law of iterated expectations)} \ &= E[E[Y(a) \mid A=a, X]] && \text{(A\ref{A:exchangeability}: $Y(a) \indep A \mid X$)} \ &= E[E[Y \mid A=a, X]] && \text{(A\ref{A:consistency}: $Y = Y(A)$)} \ &= \int E[Y \mid A=a, X=x] , dP(x) && \text{(definition)} \end{align} which depends only on the observed data distribution. \end{proof}
- Consistency Proofs
Goal: Show that an estimator converges to the true parameter value.
Standard Structure:
-
Define estimator $\hat{\theta}_n$
-
Define target parameter $\theta_0$
-
Establish convergence: $\hat{\theta}_n \xrightarrow{p} \theta_0$
Key Tools:
-
Law of Large Numbers (LLN)
-
Continuous Mapping Theorem
-
Slutsky's Theorem
-
M-estimation theory
Template:
\begin{theorem}[Consistency] Under Assumptions \ref{A1}--\ref{An}, $\hat{\theta}_n \xrightarrow{p} \theta_0$. \end{theorem}
\begin{proof} Define $M_n(\theta) = n^{-1} \sum_{i=1}^n m(O_i; \theta)$ and $M(\theta) = E[m(O; \theta)]$.
\textbf{Step 1: Uniform convergence} By [ULLN conditions], $\sup_{\theta \in \Theta} |M_n(\theta) - M(\theta)| \xrightarrow{p} 0$.
\textbf{Step 2: Unique maximum} $M(\theta)$ is uniquely maximized at $\theta_0$ (by identifiability).
\textbf{Step 3: Conclusion} By standard M-estimation theory, Steps 1--2 imply $\hat{\theta}_n \xrightarrow{p} \theta_0$. \end{proof}
- Asymptotic Normality Proofs
Goal: Establish $\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, V)$.
Standard Structure:
-
Taylor expansion around true value
-
Apply CLT to score/influence function
-
Invert Hessian/information matrix
-
State limiting distribution
Key Tools:
-
Central Limit Theorem (CLT)
-
Delta Method
-
Influence Function Theory
-
Semiparametric Efficiency Theory
Template:
\begin{theorem}[Asymptotic Normality] Under Assumptions \ref{A1}--\ref{An}, [ \sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, V) ] where $V = E[\phi(O)\phi(O)^\top]$ and $\phi$ is the influence function. \end{theorem}
\begin{proof} \textbf{Step 1: Score equation} $\hat{\theta}_n$ solves $\mathbb{P}n[\psi(O; \theta)] = 0$ where $\psi = \partial\theta m$.
\textbf{Step 2: Taylor expansion} [ 0 = \mathbb{P}_n[\psi(O; \hat{\theta}_n)] = \mathbb{P}_n[\psi(O; \theta_0)] + \mathbb{P}_n[\dot{\psi}(O; \tilde{\theta})](\hat{\theta}_n - \theta_0) ]
\textbf{Step 3: Rearrangement} [ \sqrt{n}(\hat{\theta}_n - \theta_0) = -\left(\mathbb{P}_n[\dot{\psi}]\right)^{-1} \sqrt{n} \mathbb{P}_n[\psi(O; \theta_0)] ]
\textbf{Step 4: Apply CLT} $\sqrt{n} \mathbb{P}_n[\psi(O; \theta_0)] \xrightarrow{d} N(0, \text{Var}(\psi))$ by CLT.
\textbf{Step 5: Slutsky} $\mathbb{P}_n[\dot{\psi}] \xrightarrow{p} E[\dot{\psi}]$ by WLLN. Apply Slutsky's theorem. \end{proof}
- Efficiency Proofs
Goal: Show estimator achieves semiparametric efficiency bound.
Standard Structure:
-
Characterize the tangent space
-
Derive efficient influence function (EIF)
-
Show estimator's influence function equals EIF
-
Conclude variance achieves bound
Template:
\begin{theorem}[Semiparametric Efficiency] $\hat{\theta}n$ is semiparametrically efficient with influence function [ \phi(O) = [optimal formula] ] achieving the efficiency bound $V{\text{eff}} = E[\phi(O)^2]$. \end{theorem}
- Double Robustness Proofs
Goal: Show estimator is consistent if either nuisance model is correctly specified.
Standard Structure:
-
Write estimating equation with both nuisance functions
-
Show bias term is product of two errors
-
Conclude: if either error is zero, estimator is consistent
Template:
\begin{theorem}[Double Robustness] The estimator $\hat{\psi}_{DR}$ is consistent if either: \begin{enumerate} \item The outcome model $\mu(a,x) = E[Y \mid A=a, X=x]$ is correctly specified, or \item The propensity score $\pi(x) = P(A=1 \mid X=x)$ is correctly specified. \end{enumerate} \end{theorem}
\begin{proof} The estimating equation has the form: [ \psi - \hat{\psi}_{DR} = E\left[\frac{(A-\pi)(Y-\mu)}{\pi(1-\pi)}\right] + o_p(1) ] The bias term $(A-\pi)(Y-\mu)$ is zero in expectation if either: \begin{itemize} \item $E[A-\pi \mid X] = 0$ (propensity correctly specified), or \item $E[Y-\mu \mid A, X] = 0$ (outcome correctly specified). \end{itemize} \end{proof}
Proof Verification Checklist
Level 1: Structure Check
-
Claim clearly stated with all conditions
-
All notation defined before use
-
Logical flow apparent (steps labeled)
-
Each step has explicit justification
-
Conclusion explicitly stated
Level 2: Step Validation
For each step, verify:
-
Mathematical operation is valid
-
Cited results apply (check conditions)
-
Inequalities have correct direction
-
Limits/integrals converge
-
Dimensions/types match
Level 3: Edge Cases
-
Boundary cases handled (n=1, p=0, etc.)
-
Degenerate cases addressed
-
Assumptions actually used (not vacuous)
-
What happens at assumption boundaries?
Level 4: Consistency
-
Result matches intuition
-
Special cases recover known results
-
Numerical verification possible?
-
Consistent with simulation evidence?
Common Proof Errors
Technical Errors
Error Example Fix
Interchanging limits $\lim \sum \neq \sum \lim$ Verify DCT/MCT conditions
Division by zero $1/\pi(x)$ when $\pi(x)=0$ State positivity assumption
Incorrect conditioning $E[Y \mid A,X] \neq E[Y \mid X]$ Check independence structure
Wrong norm $|f|2$ vs $|f|\infty$ Verify which space
Missing measurability Random variable not measurable State measurability
Logical Errors
Error Example Fix
Circular reasoning Using result to prove itself Check logical dependency
Unstated assumption "Clearly, X holds" Make all assumptions explicit
Incorrect quantifier $\exists$ vs $\forall$ Be precise about scope
Missing case Not handling $\theta = 0$ Enumerate all cases
Statistical Errors
Error Example Fix
Confusing $\xrightarrow{p}$ and $\xrightarrow{d}$ Different convergence modes State which mode
Ignoring dependence Applying iid CLT to dependent data Check independence
Wrong variance Using population variance for sample Distinguish estimator/parameter
Notation Standards (VanderWeele Convention)
Causal Quantities
Symbol Meaning
$Y(a)$ Potential outcome under treatment $a$
$Y(a,m)$ Potential outcome under $A=a$, $M=m$
$M(a)$ Potential mediator under treatment $a$
$NDE$ Natural Direct Effect: $E[Y(1,M(0)) - Y(0,M(0))]$
$NIE$ Natural Indirect Effect: $E[Y(1,M(1)) - Y(1,M(0))]$
$TE$ Total Effect: $E[Y(1) - Y(0)] = NDE + NIE$
$P_M$ Proportion Mediated: $NIE/TE$
Statistical Quantities
Symbol Meaning
$\theta_0$ True parameter value
$\hat{\theta}_n$ Estimator based on $n$ observations
$\phi(O)$ Influence function
$\mathbb{P}_n$ Empirical measure
$\mathbb{G}_n$ Empirical process: $\sqrt{n}(\mathbb{P}_n - P)$
Convergence
Symbol Meaning
$\xrightarrow{p}$ Convergence in probability
$\xrightarrow{d}$ Convergence in distribution
$\xrightarrow{a.s.}$ Almost sure convergence
$O_p(1)$ Bounded in probability
$o_p(1)$ Converges to zero in probability
Proof Construction Workflow
Step 1: Understand the Goal
-
What exactly needs to be proved?
-
What type of proof is this? (identification, consistency, etc.)
-
What are the key challenges?
Step 2: Gather Tools
-
What theorems/lemmas are available?
-
What regularity conditions will be needed?
-
Are there similar proofs to reference?
Step 3: Outline Structure
-
Break into logical steps
-
Identify the key technical step
-
Plan how to handle edge cases
Step 4: Write First Draft
-
Fill in details for each step
-
Be explicit about every transition
-
Note where conditions are used
Step 5: Verify
-
Run through verification checklist
-
Check each step independently
-
Test special cases
Step 6: Polish
-
Improve notation consistency
-
Add intuitive explanations
-
Ensure assumptions are minimal
Integration with Other Skills
This skill works with:
-
identification-theory - For causal identification proofs
-
asymptotic-theory - For inference proofs
-
methods-paper-writer - For presenting proofs in manuscripts
-
proof-verifier - For systematic verification
Version: 1.0 Created: 2025-12-08 Domain: Mathematical Statistics, Causal Inference
Key References
-
van der Vaart
-
Lehmann
-
Casella
-
Bickel
-
Serfling