Cross-Disciplinary Ideation

Systematic framework for discovering statistical innovations through cross-field connections

Use this skill when: brainstorming new methods, seeking novel approaches to statistical problems, looking for inspiration from other fields (physics, CS, biology, economics), or wanting to apply techniques from one domain to another.

The Cross-Disciplinary Innovation Framework

Why Cross-Disciplinary?

Many statistical breakthroughs originated elsewhere:

Statistical Method Origin Field Transfer

MCMC Physics (Metropolis) Statistical computation

Boosting Machine learning Ensemble methods

Lasso Signal processing Sparse regression

Optimal transport Mathematics Distribution comparison

Neural networks Neuroscience/CS Flexible function estimation

Causal graphs Philosophy/AI Causal inference

The Innovation Cycle

Problem in Statistics → Abstract Structure → Search Other Fields ↑ ↓ Validate/Adapt ←── Identify Analogues ←── Find Connections

Machine Learning Connections

Deep Learning for Causal Mediation

ML Method Statistical Application Transfer Opportunity

Double ML Debiased mediation effects Nuisance parameter estimation

Causal Forests Heterogeneous mediation Effect modification detection

Neural Networks Flexible g-computation Nonparametric mediation

VAEs Latent mediator modeling Measurement error correction

Transformers Sequential mediation Temporal pattern learning

GNNs Network mediation Spillover effect estimation

Double ML for mediation effect estimation

library(DoubleML)

Estimate nuisance parameters with ML

estimate_dml_mediation <- function(Y, A, M, X) {

First stage: E[M|A,X]

mediator_model <- cv.glmnet(cbind(A, X), M) M_hat <- predict(mediator_model, cbind(A, X))

Second stage: E[Y|A,M,X]

outcome_model <- cv.glmnet(cbind(A, M, X), Y)

Debiased estimation

residuals_M <- M - M_hat

list( direct = coef(outcome_model)["A"], indirect_component = residuals_M ) }

Physics Analogies

Energy-Based Statistical Models

Statistical Concept Physics Analogue Insight

Log-likelihood Energy MLE = minimum energy state

Posterior Boltzmann distribution Temperature = uncertainty

Regularization Physical constraints Penalties as forces

Entropy Thermodynamic entropy Information = disorder

Diffusion models Brownian motion Noise as generative process

MCMC Molecular dynamics Sampling as physical simulation

Productive Questions:

"What is the energy landscape of this estimation problem?"
"What physical system has this equilibrium?"
"How would a physicist think about this constraint?"

Computer Science Algorithms

Algorithmic Approaches to Statistical Problems

Algorithm Class Statistical Application Key Insight

Dynamic Programming Sequential mediation Bellman equation for path effects

Graph Algorithms DAG analysis d-separation via path finding

Approximation Algs High-dim inference Trade exactness for scalability

Online Learning Sequential testing Adaptive experiment design

Randomized Algs Monte Carlo methods Probabilistic computation

Dynamic programming for sequential mediation paths

compute_path_effects <- function(effect_matrix, n_mediators) {

effect_matrix[i,j] = effect from node i to node j

n <- nrow(effect_matrix)

Initialize path effects (like shortest path, but products)

path_effects <- matrix(0, n, n) diag(path_effects) <- 1

DP recurrence: path[i,j] = sum over k of path[i,k] * edge[k,j]

for (len in 1:n_mediators) { for (i in 1:n) { for (j in 1:n) { for (k in 1:n) { if (effect_matrix[k, j] != 0) { path_effects[i, j] <- path_effects[i, j] + path_effects[i, k] * effect_matrix[k, j] } } } } }

path_effects }

Statistics ↔ Computer Science

Statistical Concept CS Analogue Insight

Estimation Optimization Different objectives, shared algorithms

Hypothesis testing Decision theory Error rates as costs

Model selection Algorithm selection Bias-variance as time-space

Bayesian updating Online learning Sequential information

Sufficient statistics Data compression Minimal representation

Concentration inequalities PAC bounds Finite-sample guarantees

Productive Questions:

"What's the computational complexity of this estimator?"
"Is there an online version of this method?"
"What optimization algorithm solves this?"

Statistics ↔ Economics

Statistical Concept Economics Analogue Insight

Utility Loss function Preferences over outcomes

Equilibrium MLE/Bayes Optimal response

Game theory Robust statistics Adversarial settings

Mechanism design Experimental design Incentive-compatible elicitation

Instrumental variables Market instruments Exogenous variation

Regression discontinuity Policy thresholds Quasi-experiments

Productive Questions:

"What are the incentives in this data collection?"
"Is there a game-theoretic interpretation?"
"What market mechanism generates this data?"

Biology Applications

Evolutionary and Systems Biology Connections

Biological System Statistical Method Research Opportunity

Gene regulatory networks Causal DAGs Network mediation methods

Mendelian randomization Instrumental variables Genetic instruments for mediators

Population genetics Drift models Selection effects on mediators

Systems biology Structural equations Multi-level mediation

Phylogenetics Hierarchical models Evolutionary mediation

Mendelian randomization for mediation

Using genetic variants as instruments

mr_mediation <- function(snp, exposure, mediator, outcome) {

Stage 1: SNP -> Exposure

gamma_A <- coef(lm(exposure ~ snp))["snp"]

Stage 2: SNP -> Mediator (genetic effect on M)

gamma_M <- coef(lm(mediator ~ snp + exposure))["snp"]

Stage 3: Instrument-based mediation

Indirect via genetic pathway

iv_model <- ivreg(outcome ~ mediator + exposure | snp + exposure)

list( genetic_effect_exposure = gamma_A, genetic_effect_mediator = gamma_M, iv_mediation_estimate = coef(iv_model)["mediator"] * gamma_M ) }

Statistics ↔ Biology

Statistical Concept Biology Analogue Insight

Genetic algorithms Evolution Optimization by selection

Phylogenetics Hierarchical models Tree-structured dependence

Gene networks Graphical models Conditional independence

Population dynamics Time series Growth and interaction

Mendelian randomization Instrumental variables Genetic instruments

Selection bias Survivorship Conditioning on survival

Productive Questions:

"What evolutionary pressure shapes this distribution?"
"Is there a biological network analog?"
"How does selection affect what we observe?"

Statistics ↔ Mathematics

Statistical Concept Math Analogue Insight

Distributions Measures Abstract probability

Convergence Topology Modes of convergence

Sufficiency Invariance Group actions

Efficiency Geometry Information geometry

Optimal transport Measure theory Wasserstein distance

Kernel methods Functional analysis RKHS theory

Productive Questions:

"What's the geometric structure of this problem?"
"Is there a measure-theoretic generalization?"
"What invariance does this exploit?"

Structured Ideation Process

Step 1: Problem Decomposition

Break the statistical problem into abstract components:

Problem: "Estimate mediation effects with measurement error"

Components:

Causal structure (DAG with mediator)
Latent variable (true M vs observed M*)
Identification (what assumptions needed?)
Estimation (how to account for error?)
Inference (variance under misspecification?)

Step 2: Abstract Pattern Recognition

Identify the mathematical essence:

Abstract patterns in measurement error mediation:

Signal + noise model
Latent variable with proxy
Product of uncertain quantities
Attenuation toward null

Step 3: Cross-Field Search

For each abstract pattern, search analogues:

Pattern Field to Search Possible Analogues

Signal + noise Signal processing Kalman filter, denoising

Latent variable Factor analysis EM algorithm, identifiability

Product of uncertainties Physics Error propagation, Heisenberg

Attenuation Econometrics Errors-in-variables, IV

Step 4: Deep Dive on Promising Connections

For each promising analogue:

Understand the source method deeply

What problem does it solve?
What assumptions does it make?
What are its limitations?

Map to target domain

What corresponds to what?
What assumptions translate?
What doesn't transfer?

Identify the gap

What modification is needed?
Is the gap a feature or bug?
Can we fill it?

Step 5: Synthesis and Evaluation

Evaluation Criteria: □ Does it solve a real problem? □ Is it novel (not already done)? □ Are assumptions reasonable? □ Is it computationally feasible? □ Can it be proven to work (theory)? □ Does it work in practice (simulation)?

Ideation Prompts by Problem Type

When Stuck on Identification

"How do economists identify effects in similar settings?"
"What instrumental variable approach might work here?"
"Is there a regression discontinuity analog?"
"What if this were a designed experiment?"

When Stuck on Estimation

"How would a machine learner approach this?"
"Is there an EM algorithm formulation?"
"What loss function captures my goal?"
"Can I frame this as optimization?"

When Stuck on Computation

"What physics simulation technique applies?"
"Is there an approximate algorithm from CS?"
"Can I use stochastic approximation?"
"What variational approach might work?"

When Stuck on Theory

"What's the information-theoretic limit?"
"Is there a minimax lower bound?"
"What geometry characterizes this problem?"
"Can I use empirical process theory?"

When Stuck on Robustness

"What's the worst-case distribution?"
"How would a game theorist think about this?"
"What's the sensitivity to assumptions?"
"Can I bound instead of point estimate?"

Successful Transfer Examples

Example 1: Propensity Scores from Survey Sampling

Source: Survey sampling (Horvitz-Thompson estimator) Target: Causal inference (propensity score weighting)

Transfer insight:

Selection into treatment ≈ selection into sample
Inverse probability weighting corrects both
Same variance inflation issues

Innovation: Rosenbaum & Rubin (1983) - propensity score methods

Example 2: Lasso from Signal Processing

Source: Basis pursuit in signal processing Target: Variable selection in regression

Transfer insight:

Sparse signals ≈ sparse coefficients
L1 penalty induces sparsity
Convex relaxation of L0

Innovation: Tibshirani (1996) - Lasso regression

Example 3: Double Robustness from Missing Data

Source: Missing data augmented IPW Target: Causal inference estimators

Transfer insight:

Missing outcomes ≈ counterfactual outcomes
Augmentation improves efficiency
Protection against model misspecification

Innovation: Robins et al. - AIPW estimators

Example 4: Influence Functions from Robustness

Source: Robust statistics (Hampel) Target: Semiparametric efficiency

Transfer insight:

Influence function measures sensitivity
Also characterizes asymptotic variance
Efficient influence function = optimal

Innovation: Bickel et al. - semiparametric theory

Domain-Specific Prompts for Mediation Research

From Causal Inference Literature

"How do IV methods handle unmeasured confounding? Can this apply to A-M confounding?"
"What do DID approaches suggest for mediation in panel data?"
"How does synthetic control relate to mediation counterfactuals?"

From Machine Learning

"Can representation learning separate direct/indirect pathways?"
"How would a VAE model the mediation structure?"
"What does causal forest suggest for heterogeneous mediation?"

From Econometrics

"How do structural equation models in econ differ from psychology?"
"What do control functions offer for endogeneity in mediators?"
"How does Heckman selection relate to mediator measurement?"

From Biostatistics

"How does survival analysis handle time-varying mediators?"
"What do competing risks suggest for multiple mediators?"
"How does Mendelian randomization inform mediator instruments?"

From Physics/Information Theory

"What does information decomposition say about mediation?"
"How do Markov blankets relate to mediation assumptions?"
"What does the data processing inequality imply?"

Innovation Documentation Template

When you discover a promising connection:

Connection: [Source Method] → [Target Application]

Source Domain

Method: [Name and citation]
Problem it solves: [Description]
Key insight: [Core idea]
Assumptions: [What it requires]

Target Domain

Problem: [Statistical problem to solve]
Current approaches: [Existing methods and limitations]
Gap: [What's missing]

Transfer Analysis

Structural correspondence:
- [Source concept] ↔ [Target concept]
- [Source assumption] ↔ [Target assumption]
What transfers directly: [List]
What needs modification: [List]
What doesn't transfer: [List]

Proposed Innovation

Core idea: [How to adapt]
Novel contribution: [What's new]
Theoretical questions: [What to prove]
Empirical questions: [What to simulate]

Feasibility Assessment

Next Steps

[Immediate action]
[Follow-up]
[Validation approach]

Transfer Opportunities

High-Priority Cross-Disciplinary Transfers for Statistical Research

Source Field Method/Concept Target Application Innovation Potential

ML Double/debiased ML Semiparametric mediation High - removes regularization bias

ML Causal forests Heterogeneous effects High - effect modification detection

Physics Diffusion models Distribution products Medium - novel density estimation

Economics Control functions Endogenous mediators High - relaxes assumptions

CS Sketching algorithms Large-scale mediation Medium - computational gains

Biology Network motifs Mediation topology Medium - pattern recognition

Immediate Research Directions

Transfer: Control functions from economics to mediation

Relaxes sequential ignorability assumption

control_function_mediation <- function(Y, A, M, X, Z) {

Z is instrument for A

First stage: A on Z and X

stage1 <- lm(A ~ Z + X) A_residual <- residuals(stage1)

Second stage with control function

Includes residual to correct for endogeneity

stage2 <- lm(M ~ A + X + A_residual)

Third stage: outcome with control

stage3 <- lm(Y ~ A + M + X + A_residual)

list( a_to_m = coef(stage2)["A"], m_to_y = coef(stage3)["M"], indirect = coef(stage2)["A"] * coef(stage3)["M"], control_function_coef = coef(stage2)["A_residual"] ) }

Transfer Success Criteria

For any cross-disciplinary transfer, evaluate:

Structural Match: Does the source problem structure map to target?
Assumption Compatibility: Do source assumptions make sense in target?
Computational Feasibility: Is the transferred method tractable?
Novel Contribution: Is this genuinely new in the target field?
Practical Value: Does it solve a real problem researchers face?

Integration with Other Skills

This skill works with:

literature-gap-finder - Identify where innovation is needed
method-transfer-engine - Formalize the transfer
proof-architect - Prove the transferred method works
identification-theory - Check identification in new setting
methods-paper-writer - Write up the innovation

Key References

Cross-Disciplinary Statistics

Efron, B. & Hastie, T. (2016). Computer Age Statistical Inference
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning
Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory

Physics-Statistics Connection

MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms
Jaynes, E.T. (2003). Probability Theory: The Logic of Science

CS-Statistics Connection

Shalev-Shwartz, S. & Ben-David, S. (2014). Understanding Machine Learning
Vershynin, R. (2018). High-Dimensional Probability

Version: 1.0 Created: 2025-12-08 Domain: Research Innovation, Method Development

cross-disciplinary-ideation

Safety Notice

Copy this and send it to your AI assistant to learn

Double ML for mediation effect estimation

Estimate nuisance parameters with ML

First stage: E[M|A,X]

Second stage: E[Y|A,M,X]

Debiased estimation

Dynamic programming for sequential mediation paths

effect_matrix[i,j] = effect from node i to node j

Initialize path effects (like shortest path, but products)

DP recurrence: path[i,j] = sum over k of path[i,k] * edge[k,j]

Mendelian randomization for mediation

Using genetic variants as instruments

Stage 1: SNP -> Exposure

Stage 2: SNP -> Mediator (genetic effect on M)

Stage 3: Instrument-based mediation

Indirect via genetic pathway

Connection: [Source Method] → [Target Application]

Source Domain

Target Domain

Transfer Analysis

Proposed Innovation

Feasibility Assessment

Next Steps

Transfer: Control functions from economics to mediation

Relaxes sequential ignorability assumption

Z is instrument for A

First stage: A on Z and X

Second stage with control function

Includes residual to correct for endogeneity

Third stage: outcome with control

Source Transparency

Related Skills

proof-architect

asymptotic-theory

numerical-methods

literature-gap-finder