Cross-Disciplinary Ideation
Systematic framework for discovering statistical innovations through cross-field connections
Use this skill when: brainstorming new methods, seeking novel approaches to statistical problems, looking for inspiration from other fields (physics, CS, biology, economics), or wanting to apply techniques from one domain to another.
The Cross-Disciplinary Innovation Framework
Why Cross-Disciplinary?
Many statistical breakthroughs originated elsewhere:
Statistical Method Origin Field Transfer
MCMC Physics (Metropolis) Statistical computation
Boosting Machine learning Ensemble methods
Lasso Signal processing Sparse regression
Optimal transport Mathematics Distribution comparison
Neural networks Neuroscience/CS Flexible function estimation
Causal graphs Philosophy/AI Causal inference
The Innovation Cycle
Problem in Statistics → Abstract Structure → Search Other Fields ↑ ↓ Validate/Adapt ←── Identify Analogues ←── Find Connections
Machine Learning Connections
Deep Learning for Causal Mediation
ML Method Statistical Application Transfer Opportunity
Double ML Debiased mediation effects Nuisance parameter estimation
Causal Forests Heterogeneous mediation Effect modification detection
Neural Networks Flexible g-computation Nonparametric mediation
VAEs Latent mediator modeling Measurement error correction
Transformers Sequential mediation Temporal pattern learning
GNNs Network mediation Spillover effect estimation
Double ML for mediation effect estimation
library(DoubleML)
Estimate nuisance parameters with ML
estimate_dml_mediation <- function(Y, A, M, X) {
First stage: E[M|A,X]
mediator_model <- cv.glmnet(cbind(A, X), M) M_hat <- predict(mediator_model, cbind(A, X))
Second stage: E[Y|A,M,X]
outcome_model <- cv.glmnet(cbind(A, M, X), Y)
Debiased estimation
residuals_M <- M - M_hat
list( direct = coef(outcome_model)["A"], indirect_component = residuals_M ) }
Physics Analogies
Energy-Based Statistical Models
Statistical Concept Physics Analogue Insight
Log-likelihood Energy MLE = minimum energy state
Posterior Boltzmann distribution Temperature = uncertainty
Regularization Physical constraints Penalties as forces
Entropy Thermodynamic entropy Information = disorder
Diffusion models Brownian motion Noise as generative process
MCMC Molecular dynamics Sampling as physical simulation
Productive Questions:
-
"What is the energy landscape of this estimation problem?"
-
"What physical system has this equilibrium?"
-
"How would a physicist think about this constraint?"
Computer Science Algorithms
Algorithmic Approaches to Statistical Problems
Algorithm Class Statistical Application Key Insight
Dynamic Programming Sequential mediation Bellman equation for path effects
Graph Algorithms DAG analysis d-separation via path finding
Approximation Algs High-dim inference Trade exactness for scalability
Online Learning Sequential testing Adaptive experiment design
Randomized Algs Monte Carlo methods Probabilistic computation
Dynamic programming for sequential mediation paths
compute_path_effects <- function(effect_matrix, n_mediators) {
effect_matrix[i,j] = effect from node i to node j
n <- nrow(effect_matrix)
Initialize path effects (like shortest path, but products)
path_effects <- matrix(0, n, n) diag(path_effects) <- 1
DP recurrence: path[i,j] = sum over k of path[i,k] * edge[k,j]
for (len in 1:n_mediators) { for (i in 1:n) { for (j in 1:n) { for (k in 1:n) { if (effect_matrix[k, j] != 0) { path_effects[i, j] <- path_effects[i, j] + path_effects[i, k] * effect_matrix[k, j] } } } } }
path_effects }
Statistics ↔ Computer Science
Statistical Concept CS Analogue Insight
Estimation Optimization Different objectives, shared algorithms
Hypothesis testing Decision theory Error rates as costs
Model selection Algorithm selection Bias-variance as time-space
Bayesian updating Online learning Sequential information
Sufficient statistics Data compression Minimal representation
Concentration inequalities PAC bounds Finite-sample guarantees
Productive Questions:
-
"What's the computational complexity of this estimator?"
-
"Is there an online version of this method?"
-
"What optimization algorithm solves this?"
Statistics ↔ Economics
Statistical Concept Economics Analogue Insight
Utility Loss function Preferences over outcomes
Equilibrium MLE/Bayes Optimal response
Game theory Robust statistics Adversarial settings
Mechanism design Experimental design Incentive-compatible elicitation
Instrumental variables Market instruments Exogenous variation
Regression discontinuity Policy thresholds Quasi-experiments
Productive Questions:
-
"What are the incentives in this data collection?"
-
"Is there a game-theoretic interpretation?"
-
"What market mechanism generates this data?"
Biology Applications
Evolutionary and Systems Biology Connections
Biological System Statistical Method Research Opportunity
Gene regulatory networks Causal DAGs Network mediation methods
Mendelian randomization Instrumental variables Genetic instruments for mediators
Population genetics Drift models Selection effects on mediators
Systems biology Structural equations Multi-level mediation
Phylogenetics Hierarchical models Evolutionary mediation
Mendelian randomization for mediation
Using genetic variants as instruments
mr_mediation <- function(snp, exposure, mediator, outcome) {
Stage 1: SNP -> Exposure
gamma_A <- coef(lm(exposure ~ snp))["snp"]
Stage 2: SNP -> Mediator (genetic effect on M)
gamma_M <- coef(lm(mediator ~ snp + exposure))["snp"]
Stage 3: Instrument-based mediation
Indirect via genetic pathway
iv_model <- ivreg(outcome ~ mediator + exposure | snp + exposure)
list( genetic_effect_exposure = gamma_A, genetic_effect_mediator = gamma_M, iv_mediation_estimate = coef(iv_model)["mediator"] * gamma_M ) }
Statistics ↔ Biology
Statistical Concept Biology Analogue Insight
Genetic algorithms Evolution Optimization by selection
Phylogenetics Hierarchical models Tree-structured dependence
Gene networks Graphical models Conditional independence
Population dynamics Time series Growth and interaction
Mendelian randomization Instrumental variables Genetic instruments
Selection bias Survivorship Conditioning on survival
Productive Questions:
-
"What evolutionary pressure shapes this distribution?"
-
"Is there a biological network analog?"
-
"How does selection affect what we observe?"
Statistics ↔ Mathematics
Statistical Concept Math Analogue Insight
Distributions Measures Abstract probability
Convergence Topology Modes of convergence
Sufficiency Invariance Group actions
Efficiency Geometry Information geometry
Optimal transport Measure theory Wasserstein distance
Kernel methods Functional analysis RKHS theory
Productive Questions:
-
"What's the geometric structure of this problem?"
-
"Is there a measure-theoretic generalization?"
-
"What invariance does this exploit?"
Structured Ideation Process
Step 1: Problem Decomposition
Break the statistical problem into abstract components:
Problem: "Estimate mediation effects with measurement error"
Components:
- Causal structure (DAG with mediator)
- Latent variable (true M vs observed M*)
- Identification (what assumptions needed?)
- Estimation (how to account for error?)
- Inference (variance under misspecification?)
Step 2: Abstract Pattern Recognition
Identify the mathematical essence:
Abstract patterns in measurement error mediation:
- Signal + noise model
- Latent variable with proxy
- Product of uncertain quantities
- Attenuation toward null
Step 3: Cross-Field Search
For each abstract pattern, search analogues:
Pattern Field to Search Possible Analogues
Signal + noise Signal processing Kalman filter, denoising
Latent variable Factor analysis EM algorithm, identifiability
Product of uncertainties Physics Error propagation, Heisenberg
Attenuation Econometrics Errors-in-variables, IV
Step 4: Deep Dive on Promising Connections
For each promising analogue:
Understand the source method deeply
-
What problem does it solve?
-
What assumptions does it make?
-
What are its limitations?
Map to target domain
-
What corresponds to what?
-
What assumptions translate?
-
What doesn't transfer?
Identify the gap
-
What modification is needed?
-
Is the gap a feature or bug?
-
Can we fill it?
Step 5: Synthesis and Evaluation
Evaluation Criteria: □ Does it solve a real problem? □ Is it novel (not already done)? □ Are assumptions reasonable? □ Is it computationally feasible? □ Can it be proven to work (theory)? □ Does it work in practice (simulation)?
Ideation Prompts by Problem Type
When Stuck on Identification
-
"How do economists identify effects in similar settings?"
-
"What instrumental variable approach might work here?"
-
"Is there a regression discontinuity analog?"
-
"What if this were a designed experiment?"
When Stuck on Estimation
-
"How would a machine learner approach this?"
-
"Is there an EM algorithm formulation?"
-
"What loss function captures my goal?"
-
"Can I frame this as optimization?"
When Stuck on Computation
-
"What physics simulation technique applies?"
-
"Is there an approximate algorithm from CS?"
-
"Can I use stochastic approximation?"
-
"What variational approach might work?"
When Stuck on Theory
-
"What's the information-theoretic limit?"
-
"Is there a minimax lower bound?"
-
"What geometry characterizes this problem?"
-
"Can I use empirical process theory?"
When Stuck on Robustness
-
"What's the worst-case distribution?"
-
"How would a game theorist think about this?"
-
"What's the sensitivity to assumptions?"
-
"Can I bound instead of point estimate?"
Successful Transfer Examples
Example 1: Propensity Scores from Survey Sampling
Source: Survey sampling (Horvitz-Thompson estimator) Target: Causal inference (propensity score weighting)
Transfer insight:
-
Selection into treatment ≈ selection into sample
-
Inverse probability weighting corrects both
-
Same variance inflation issues
Innovation: Rosenbaum & Rubin (1983) - propensity score methods
Example 2: Lasso from Signal Processing
Source: Basis pursuit in signal processing Target: Variable selection in regression
Transfer insight:
-
Sparse signals ≈ sparse coefficients
-
L1 penalty induces sparsity
-
Convex relaxation of L0
Innovation: Tibshirani (1996) - Lasso regression
Example 3: Double Robustness from Missing Data
Source: Missing data augmented IPW Target: Causal inference estimators
Transfer insight:
-
Missing outcomes ≈ counterfactual outcomes
-
Augmentation improves efficiency
-
Protection against model misspecification
Innovation: Robins et al. - AIPW estimators
Example 4: Influence Functions from Robustness
Source: Robust statistics (Hampel) Target: Semiparametric efficiency
Transfer insight:
-
Influence function measures sensitivity
-
Also characterizes asymptotic variance
-
Efficient influence function = optimal
Innovation: Bickel et al. - semiparametric theory
Domain-Specific Prompts for Mediation Research
From Causal Inference Literature
-
"How do IV methods handle unmeasured confounding? Can this apply to A-M confounding?"
-
"What do DID approaches suggest for mediation in panel data?"
-
"How does synthetic control relate to mediation counterfactuals?"
From Machine Learning
-
"Can representation learning separate direct/indirect pathways?"
-
"How would a VAE model the mediation structure?"
-
"What does causal forest suggest for heterogeneous mediation?"
From Econometrics
-
"How do structural equation models in econ differ from psychology?"
-
"What do control functions offer for endogeneity in mediators?"
-
"How does Heckman selection relate to mediator measurement?"
From Biostatistics
-
"How does survival analysis handle time-varying mediators?"
-
"What do competing risks suggest for multiple mediators?"
-
"How does Mendelian randomization inform mediator instruments?"
From Physics/Information Theory
-
"What does information decomposition say about mediation?"
-
"How do Markov blankets relate to mediation assumptions?"
-
"What does the data processing inequality imply?"
Innovation Documentation Template
When you discover a promising connection:
Connection: [Source Method] → [Target Application]
Source Domain
- Method: [Name and citation]
- Problem it solves: [Description]
- Key insight: [Core idea]
- Assumptions: [What it requires]
Target Domain
- Problem: [Statistical problem to solve]
- Current approaches: [Existing methods and limitations]
- Gap: [What's missing]
Transfer Analysis
-
Structural correspondence:
- [Source concept] ↔ [Target concept]
- [Source assumption] ↔ [Target assumption]
-
What transfers directly: [List]
-
What needs modification: [List]
-
What doesn't transfer: [List]
Proposed Innovation
- Core idea: [How to adapt]
- Novel contribution: [What's new]
- Theoretical questions: [What to prove]
- Empirical questions: [What to simulate]
Feasibility Assessment
- Theoretically sound
- Computationally tractable
- Practically relevant
- Sufficiently novel
- Publishable venue: [Journal]
Next Steps
- [Immediate action]
- [Follow-up]
- [Validation approach]
Transfer Opportunities
High-Priority Cross-Disciplinary Transfers for Statistical Research
Source Field Method/Concept Target Application Innovation Potential
ML Double/debiased ML Semiparametric mediation High - removes regularization bias
ML Causal forests Heterogeneous effects High - effect modification detection
Physics Diffusion models Distribution products Medium - novel density estimation
Economics Control functions Endogenous mediators High - relaxes assumptions
CS Sketching algorithms Large-scale mediation Medium - computational gains
Biology Network motifs Mediation topology Medium - pattern recognition
Immediate Research Directions
Transfer: Control functions from economics to mediation
Relaxes sequential ignorability assumption
control_function_mediation <- function(Y, A, M, X, Z) {
Z is instrument for A
First stage: A on Z and X
stage1 <- lm(A ~ Z + X) A_residual <- residuals(stage1)
Second stage with control function
Includes residual to correct for endogeneity
stage2 <- lm(M ~ A + X + A_residual)
Third stage: outcome with control
stage3 <- lm(Y ~ A + M + X + A_residual)
list( a_to_m = coef(stage2)["A"], m_to_y = coef(stage3)["M"], indirect = coef(stage2)["A"] * coef(stage3)["M"], control_function_coef = coef(stage2)["A_residual"] ) }
Transfer Success Criteria
For any cross-disciplinary transfer, evaluate:
-
Structural Match: Does the source problem structure map to target?
-
Assumption Compatibility: Do source assumptions make sense in target?
-
Computational Feasibility: Is the transferred method tractable?
-
Novel Contribution: Is this genuinely new in the target field?
-
Practical Value: Does it solve a real problem researchers face?
Integration with Other Skills
This skill works with:
-
literature-gap-finder - Identify where innovation is needed
-
method-transfer-engine - Formalize the transfer
-
proof-architect - Prove the transferred method works
-
identification-theory - Check identification in new setting
-
methods-paper-writer - Write up the innovation
Key References
Cross-Disciplinary Statistics
-
Efron, B. & Hastie, T. (2016). Computer Age Statistical Inference
-
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning
-
Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory
Physics-Statistics Connection
-
MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms
-
Jaynes, E.T. (2003). Probability Theory: The Logic of Science
CS-Statistics Connection
-
Shalev-Shwartz, S. & Ben-David, S. (2014). Understanding Machine Learning
-
Vershynin, R. (2018). High-Dimensional Probability
Version: 1.0 Created: 2025-12-08 Domain: Research Innovation, Method Development