Literature Gap Finder
Systematic framework for identifying research opportunities in statistical methodology
Use this skill when: positioning research contributions, finding gaps in methodology literature, identifying unexplored combinations of methods and settings, building literature reviews, or deciding on research directions.
The Gap-Finding Framework
What Makes a Good Research Gap?
A publishable gap must be:
-
Real - Not already addressed (check thoroughly!)
-
Important - Solves a problem researchers face
-
Tractable - Can be solved with available tools
-
Novel - Provides new insight, not just combination
-
Timely - Relevant to current research trends
Types of Gaps
Gap Type Description Example
Method Gap No method exists for setting No mediation analysis for network data
Theory Gap Method exists but lacks theory Bootstrap for mediation lacks consistency proof
Efficiency Gap Methods exist but are inefficient Doubly robust mediation more efficient
Robustness Gap Methods fail under violations Mediation under measurement error
Computational Gap Existing methods don't scale Mediation with high-dimensional confounders
Extension Gap Existing method needs generalization Binary → continuous mediator
Method-Setting Matrix
Systematic Gap Identification Framework
The method-setting matrix is the core tool for finding research gaps systematically:
Build a method-setting matrix programmatically
create_gap_matrix <- function() { methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based") settings <- c("Binary treatment", "Continuous treatment", "Time-varying", "Clustered", "High-dimensional", "Measurement error", "Missing data", "Network")
matrix_data <- expand.grid(method = methods, setting = settings) matrix_data$status <- "unknown" # To be filled: "developed", "partial", "gap" matrix_data$priority <- NA matrix_data$references <- ""
matrix_data }
Visualize the gap matrix
visualize_gaps <- function(gap_matrix) { library(ggplot2)
ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) + geom_tile(color = "white") + scale_fill_manual(values = c( "developed" = "#2ecc71", "partial" = "#f39c12", "gap" = "#e74c3c", "unknown" = "#95a5a6" )) + theme_minimal() + labs(title = "Method × Setting Gap Matrix", x = "Method", y = "Setting") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) }
Verification Process
Confirming a Gap is Real
Before claiming a gap, verify systematically:
Step Action Tools
1 Search major databases Google Scholar, Web of Science, Scopus
2 Search preprint servers arXiv, bioRxiv, SSRN
3 Search R packages CRAN, GitHub, R-universe
4 Check conference proceedings ICML, NeurIPS, JSM, ENAR
5 Search dissertations ProQuest, university repositories
6 Email domain experts 2-3 experts for confirmation
Systematic verification checklist
verify_gap <- function(topic, keywords) { checklist <- list( databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"), search_terms = keywords, date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()), results = list( papers_found = 0, closest_related = c(), why_not_the_same = "" ), expert_consultation = list( experts_contacted = c(), responses = c() ), verification_status = "pending" # pending, confirmed, rejected )
checklist }
Document the verification
document_verification <- function(gap_description, search_log) { cat("## Gap Verification Report\n\n") cat("Gap:", gap_description, "\n\n") cat("Search Date:", as.character(Sys.Date()), "\n\n") cat("Databases Searched:\n") for (db in search_log$databases_searched) { cat("- ", db, "\n") } cat("\nSearch Terms:", paste(search_log$search_terms, collapse = ", "), "\n") cat("\nConclusion:", search_log$verification_status, "\n") }
Priority Ranking
Scoring Research Gaps
Criterion Weight Score 1-5
Impact (how many benefit?) 0.25
Novelty (how new?) 0.20
Tractability (can we solve it?) 0.20
Timeliness (is it hot now?) 0.15
Fit (matches our expertise?) 0.10
Publication potential 0.10
Priority Score = Σ(weight × score)
Priority scoring function
score_research_gap <- function( impact, # 1-5: How many researchers would benefit novelty, # 1-5: How new/original is this tractability, # 1-5: How likely can we solve it timeliness, # 1-5: Is this currently hot fit, # 1-5: Matches our expertise publication # 1-5: Publication potential ) { weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10) scores <- c(impact, novelty, tractability, timeliness, fit, publication)
priority <- sum(weights * scores)
list( priority_score = priority, interpretation = case_when( priority >= 4.0 ~ "High priority - pursue immediately", priority >= 3.0 ~ "Medium priority - develop further", priority >= 2.0 ~ "Low priority - back burner", TRUE ~ "Skip - not worth pursuing" ), breakdown = data.frame( criterion = c("Impact", "Novelty", "Tractability", "Timeliness", "Fit", "Publication"), weight = weights, score = scores, weighted = weights * scores ) ) }
Compare multiple gaps
rank_gaps <- function(gaps_list) { scores <- sapply(gaps_list, function(g) g$priority_score) order(scores, decreasing = TRUE) }
Method × Setting Matrix
The Core Framework
Systematically map methods against settings to find gaps:
METHODS
│ Regression │ Weighting │ DR/TMLE │ ML-based │
──────────┼────────────┼───────────┼─────────┼──────────│ Binary A │ ✓ │ ✓ │ ✓ │ ✓ │ Continuous│ ✓ │ ? │ ✓ │ ? │ SETTINGS ├────────────┼───────────┼─────────┼──────────│ Time-vary │ ? │ ✓ │ ✓ │ ✗ │ Clustered │ ✓ │ ? │ ? │ ✗ │ High-dim │ ✗ │ ✗ │ ? │ ✓ │
✓ = Well-developed ? = Partial/emerging ✗ = Gap
Building Your Matrix
Step 1: Identify Dimensions
For mediation analysis:
Dimension Variations
Treatment Binary, continuous, multi-level, time-varying
Mediator Single, multiple, high-dimensional, latent
Outcome Continuous, binary, count, survival, longitudinal
Confounding Measured, unmeasured, time-varying
Structure Single mediator, parallel, sequential, moderated
Data Cross-sectional, longitudinal, clustered, network
Assumptions Standard, relaxed positivity, measurement error
Step 2: List Methods
Method Family Specific Methods
Regression Baron-Kenny, product of coefficients, difference
Weighting IPW, MSM, sequential g-estimation
Doubly Robust AIPW, TMLE, cross-fitted
Semiparametric Influence function-based
Bayesian MCMC, variational
Machine Learning Causal forests, DML, neural
Bounds Partial identification, sensitivity
Step 3: Fill and Analyze
Mark each cell:
-
✓ (green): Well-established with theory + software
-
? (yellow): Emerging or partial coverage
-
✗ (red): Clear gap
Example: Sequential Mediation Matrix
│ Product │ Weighting │ DR │ Bounds │
─────────────────────────┼─────────┼───────────┼────┼────────│ 2 mediators, linear │ ✓ │ ✓ │ ✓ │ ? │ 2 mediators, nonlinear │ ? │ ✓ │ ? │ ✗ │ 3+ mediators, linear │ ? │ ? │ ✗ │ ✗ │ 3+ mediators, nonlinear │ ✗ │ ? │ ✗ │ ✗ │ With measurement error │ ✗ │ ✗ │ ✗ │ ✗ │ With unmeasured conf. │ ✗ │ ✗ │ ✗ │ ? │
Gaps identified:
-
DR methods for 3+ mediators
-
Any method with measurement error
-
Bounds approach underdeveloped
Assumption Relaxation Trees
The Framework
Map how assumptions have been relaxed over time:
Standard Mediation (Baron-Kenny 1986)
│
┌─────────────────┼─────────────────┐
↓ ↓ ↓
No unmeasured Linearity No interaction
confounding assumed assumed
│ │ │
↓ ↓ ↓
┌───────┴───────┐ Nonparametric VanderWeele
↓ ↓ (Imai 2010) 4-way decomp
Sensitivity Bounds │ (Imai 2010) (partial ID) ↓ │ │ Multiple mediators? ↓ ↓ Longitudinal? E-value Sharp bounds? Measurement error? (Ding 2016) │ │ │ ↓ ↓ ↓ [YOUR GAP?] [YOUR GAP?] [YOUR GAP?]
Building the Tree
Step 1: Identify Original Assumptions
For a classic method, list ALL assumptions:
-
Explicit assumptions (stated in paper)
-
Implicit assumptions (unstated but required)
-
Computational assumptions (required for implementation)
Step 2: Trace Relaxation History
For each assumption, find papers that:
-
Relax it partially
-
Relax it completely
-
Replace it with different assumption
-
Show consequences of violation
Step 3: Find Unexplored Branches
Look for:
-
Combinations of relaxations not yet explored
-
Relaxations in one method not applied to another
-
Partial relaxations that could be completed
Example: Positivity Assumption
Positivity: P(A=a|X) > ε > 0 for all a, x │ ┌───────────────┼───────────────┐ ↓ ↓ ↓ Near-violation Practical Structural positivity violations │ │ │ ↓ ↓ ↓ Trimming Overlap Extrapolation weights assessment methods │ │ │ ↓ ↓ ↓ Truncation? Diagnostics? Bounds under violations?
Citation Network Analysis
Forward and Backward Searching
Backward: From recent key paper, trace citations:
-
What foundational papers are cited?
-
What parallel developments exist?
-
What's the intellectual lineage?
Forward: Using Google Scholar "Cited by":
-
Who has built on this work?
-
What extensions were made?
-
What gaps remain unaddressed?
Key Paper Identification
For any topic, identify:
Category Description How to Find
Foundational Original method papers Most-cited, oldest
Textbook Comprehensive treatments Citations across subfields
Recent reviews State-of-the-art summaries "Review" in title, last 5 years
Frontier Latest developments Top journals, last 2 years
Your competition Groups working on same gap Recent similar titles
Building a Citation Map
1986: Baron & Kenny [foundations] │ ├──→ 1990s: SEM extensions │ ├──→ 2004: Robins & Greenland [causal foundations] │ │ │ ├──→ 2010: Imai et al. [sensitivity] │ │ │ ├──→ 2010: VanderWeele [4-way] │ │ │ │ │ └──→ 2015: Book [comprehensive] │ │ │ └──→ 2014: Tchetgen [semiparametric] │ └──→ 2020s: ML integration [frontier]
Gap Verification Checklist
Before claiming a gap, verify:
- Literature Search
-
Searched Google Scholar with multiple keyword combinations
-
Searched arXiv stat.ME and stat.TH
-
Searched JSTOR for older statistics journals
-
Searched bioRxiv/medRxiv for preprints
-
Checked reference lists of review papers
-
Checked "cited by" for key papers
- Terminology Check
-
Same concept might have different names in different fields
-
Checked econometrics terminology
-
Checked biostatistics terminology
-
Checked machine learning terminology
-
Checked psychology/SEM terminology
- Adjacent Literature
-
Checked related but not identical settings
-
Method might exist for similar problem
-
Checked if general framework applies
- Working Papers
-
Checked key authors' websites
-
Checked conference proceedings (JSM, ENAR)
-
Asked collaborators/experts
- Final Verification
-
Gap is not addressed in supplementary materials
-
Gap is not "obvious" extension reviewers will dismiss
-
Gap is important enough to publish
Gap Characterization Template
When you identify a gap:
Gap: [Brief Title]
Setting
[Precise description of the setting where the gap exists]
Current State
- What exists: [Methods that partially address this]
- What works: [Aspects of the problem already solved]
- What fails: [Where current methods break down]
The Gap
- Precise statement: [What is missing]
- Why it matters: [Who needs this, for what applications]
- Why it's hard: [Technical challenges]
Evidence of Gap
- Literature search documented
- No existing solution found
- Experts consulted (optional)
Potential Approaches
-
[Approach 1]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
-
[Approach 2]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
Related Work
- [Paper 1]: [How it relates, why it doesn't solve gap]
- [Paper 2]: [How it relates, why it doesn't solve gap]
Contribution Positioning
"While [existing work] addresses [related problem], no method currently handles [specific gap]. We propose [approach] which provides [properties]."
Common Gap Patterns in Mediation
Pattern 1: Data Structure Mismatch
Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"
Examples:
-
Methods assume iid, but data is clustered
-
Methods assume cross-sectional, but data is longitudinal
-
Methods assume complete data, but missingness exists
Pattern 2: Assumption Violation
Gap template: "[Method] requires [assumption], which is violated when [situation]"
Examples:
-
Unmeasured mediator-outcome confounding
-
Measurement error in mediator
-
Treatment-mediator interaction
Pattern 3: Estimand Ambiguity
Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"
Examples:
-
Post-treatment confounding
-
Time-varying treatments/mediators
-
Multiple versions of treatment
Pattern 4: Efficiency vs Robustness
Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"
Examples:
-
Doubly robust methods for mediation
-
Semiparametric efficiency in complex settings
-
Adaptive methods
Pattern 5: Computational Barrier
Gap template: "Theoretically valid approach exists but [computational limitation]"
Examples:
-
High-dimensional settings
-
Continuous mediators requiring integration
-
Bootstrap in complex models
Research Positioning Strategies
The Contribution Statement
Strong positioning formula:
"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."
Positioning Types
Position When to Use Example Language
Extension Build on existing "We extend [method] to [new setting]"
Synthesis Combine approaches "We unify [method A] and [method B]"
Alternative Different approach "We propose an alternative that [advantage]"
Correction Fix limitation "We address the limitation of [method]"
Generalization Broader framework "We develop a general framework that includes [special cases]"
Differentiation Matrix
Dimension Competitor 1 Competitor 2 Our Method
Setting Binary A only Any A Any A
Theory Consistency
- Normality
- Efficiency
Assumptions Strong Medium Weaker
Computation Fast Slow Medium
Software R package None R + Python
Integration with Other Skills
This skill works with:
-
cross-disciplinary-ideation - Find solutions from other fields for identified gaps
-
method-transfer-engine - Transfer methods to fill gaps
-
identification-theory - Understand what assumptions are needed
-
methods-paper-writer - Write up the gap and solution
Key References
On Finding Research Gaps
-
Alvesson, M. & Sandberg, J. (2011). Generating research questions through problematization
-
Sandberg, J. & Alvesson, M. (2011). Ways of constructing research questions
Mediation Reviews (for gap identification)
-
VanderWeele, T.J. (2016). Mediation analysis: A practitioner's guide. Annual Review
-
Nguyen, T.Q. et al. (2021). Clarifying causal mediation analysis. Psychological Methods
Causal Inference Reviews
-
Hernán, M.A. (2018). The C-word: Scientific euphemisms do not improve causal inference
-
Imbens, G.W. (2020). Potential outcome and directed acyclic graph approaches
Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review