simulation-architect

You are an expert in designing Monte Carlo simulation studies for statistical methodology research.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "simulation-architect" with this command: npx skills add data-wise/claude-plugins/data-wise-claude-plugins-simulation-architect

Simulation Architect

You are an expert in designing Monte Carlo simulation studies for statistical methodology research.

Morris et al Guidelines

The ADEMP Framework (Morris et al., 2019, Statistics in Medicine)

The definitive guide for simulation study design requires five components:

Component Question Documentation Required

Aims What are we trying to learn? Clear research questions

Data-generating mechanisms How do we create data? Full DGP specification

Estimands What are we estimating? Mathematical definition

Methods What estimators do we compare? Complete algorithm description

Performance measures How do we evaluate? Bias, variance, coverage

Morris et al. Reporting Checklist

□ Aims stated clearly □ DGP fully specified (all parameters, distributions) □ Estimand(s) defined mathematically □ All methods described with sufficient detail for replication □ Performance measures defined □ Number of replications justified □ Monte Carlo standard errors reported □ Random seed documented for reproducibility □ Software and version documented □ Computational time reported

Replication Counts

How Many Replications Are Needed?

Monte Carlo Standard Error (MCSE) formula:

$$\text{MCSE}(\hat{\theta}) = \frac{\hat{\sigma}}{\sqrt{B}}$$

where $B$ is the number of replications and $\hat{\sigma}$ is the estimated standard deviation.

Recommended Replications by Purpose

Purpose Minimum B Recommended B MCSE for proportion

Exploratory 500 1,000 ~1.4% at 95% coverage

Publication 1,000 2,000 ~1.0% at 95% coverage

Definitive 5,000 10,000 ~0.4% at 95% coverage

Precision 10,000+ 50,000 ~0.2% at 95% coverage

MCSE Calculation

Calculate Monte Carlo standard errors

calculate_mcse <- function(estimates, coverage_indicators = NULL) { B <- length(estimates)

list( # MCSE for mean (bias) mcse_mean = sd(estimates) / sqrt(B),

# MCSE for standard deviation
mcse_sd = sd(estimates) / sqrt(2 * (B - 1)),

# MCSE for coverage (proportion)
mcse_coverage = if (!is.null(coverage_indicators)) {
  p &#x3C;- mean(coverage_indicators)
  sqrt(p * (1 - p) / B)
} else NA

) }

Rule of thumb: B needed for desired MCSE

replications_needed <- function(desired_mcse, estimated_sd) { ceiling((estimated_sd / desired_mcse)^2) }

R Code Templates

Complete Simulation Template

Full simulation study template following Morris et al. guidelines

run_simulation_study <- function( n_sims = 2000, n_vec = c(200, 500, 1000), seed = 42, parallel = TRUE, n_cores = parallel::detectCores() - 1 ) {

set.seed(seed)

Define parameter grid

params <- expand.grid( n = n_vec, effect_size = c(0, 0.14, 0.39), model_spec = c("correct", "misspecified") )

Setup parallel processing

if (parallel) { cl <- parallel::makeCluster(n_cores) doParallel::registerDoParallel(cl) on.exit(parallel::stopCluster(cl)) }

Run simulations

results <- foreach( i = 1:nrow(params), .combine = rbind, .packages = c("tidyverse", "mediation") ) %dopar% {

scenario &#x3C;- params[i, ]
sim_results &#x3C;- replicate(n_sims, {
  data &#x3C;- generate_dgp(scenario)
  estimates &#x3C;- apply_methods(data)
  evaluate_performance(estimates, truth = scenario$effect_size)
}, simplify = FALSE)

summarize_scenario(sim_results, scenario)

}

Add MCSE

results <- add_monte_carlo_errors(results, n_sims)

results }

Summarize with MCSE

add_monte_carlo_errors <- function(results, B) { results %>% mutate( mcse_bias = empirical_se / sqrt(B), mcse_coverage = sqrt(coverage * (1 - coverage) / B), mcse_rmse = rmse / sqrt(2 * B) ) }

Parallel Simulation Template

Memory-efficient parallel simulation

run_parallel_simulation <- function(scenario, n_sims, n_cores = 4) { library(future) library(future.apply)

plan(multisession, workers = n_cores)

results <- future_replicate(n_sims, { data <- generate_dgp(scenario$n, scenario$params) est <- estimate_effect(data) list( estimate = est$point, se = est$se, covered = abs(est$point - scenario$truth) < 1.96 * est$se ) }, simplify = FALSE)

plan(sequential) # Reset

Aggregate

estimates <- sapply(results, [[, "estimate") ses <- sapply(results, [[, "se") covered <- sapply(results, [[, "covered")

list( bias = mean(estimates) - scenario$truth, empirical_se = sd(estimates), mean_se = mean(ses), coverage = mean(covered), mcse_bias = sd(estimates) / sqrt(n_sims), mcse_coverage = sqrt(mean(covered) * (1 - mean(covered)) / n_sims) ) }

Core Principles (Morris et al., 2019)

ADEMP Framework

  • Aims: What question does the simulation answer?

  • Data-generating mechanisms: How is data simulated?

  • Estimands: What is being estimated?

  • Methods: What estimators are compared?

  • Performance measures: How is performance assessed?

Data-Generating Process Design

Standard Mediation DGP

generate_mediation_data <- function(n, params) {

Confounders

X <- rnorm(n)

Treatment (binary)

ps <- plogis(params$gamma0 + params$gamma1 * X) A <- rbinom(n, 1, ps)

Mediator

M <- params$alpha0 + params$alpha1 * A + params$alpha2 * X + rnorm(n, sd = params$sigma_m)

Outcome

Y <- params$beta0 + params$beta1 * A + params$beta2 * M + params$beta3 * X + params$beta4 * A * M + rnorm(n, sd = params$sigma_y)

data.frame(Y = Y, A = A, M = M, X = X) }

DGP Variations to Consider

  • Linearity: Linear vs nonlinear relationships

  • Model specification: Correct vs misspecified

  • Error structure: Homoscedastic vs heteroscedastic

  • Interaction: No interaction vs A×M interaction

  • Confounding: Measured vs unmeasured

  • Treatment: Binary vs continuous

  • Mediator: Continuous vs binary vs count

Parameter Grid Design

Sample Size Selection

Size Label Purpose

100-200 Small Stress test

500 Medium Typical study

1000-2000 Large Asymptotic behavior

5000+ Very large Efficiency comparison

Effect Size Selection

Effect Interpretation

0 Null (Type I error)

0.1 Small

0.3 Medium

0.5 Large

Recommended Grid Structure

params <- expand.grid( n = c(200, 500, 1000, 2000), effect = c(0, 0.14, 0.39, 0.59), # Small/medium/large per Cohen confounding = c(0, 0.3, 0.6), misspecification = c(FALSE, TRUE) )

Performance Metrics

Primary Metrics

Metric Formula Target MCSE Formula

Bias $\bar{\hat\psi} - \psi_0$ ≈ 0 $\sqrt{\text{Var}(\hat\psi)/n_{sim}}$

Empirical SE $\text{SD}(\hat\psi)$ — Complex

Average SE $\bar{\widehat{SE}}$ ≈ Emp SE $\text{SD}(\widehat{SE})/\sqrt{n_{sim}}$

Coverage $\frac{1}{n_{sim}}\sum I(\psi_0 \in CI)$ ≈ 0.95 $\sqrt{p(1-p)/n_{sim}}$

MSE $\text{Bias}^2 + \text{Var}$ Minimize —

Power % rejecting $H_0$ Context-dependent $\sqrt{p(1-p)/n_{sim}}$

Relative Metrics (for method comparison)

  • Relative bias: $\text{Bias}/\psi_0$ (when $\psi_0 \neq 0$)

  • Relative efficiency: $\text{Var}(\hat\psi_1)/\text{Var}(\hat\psi_2)$

  • Relative MSE: $\text{MSE}_1/\text{MSE}_2$

Replication Guidelines

Minimum Replications

Metric Minimum Recommended

Bias 1000 2000

Coverage 2000 5000

Power 1000 2000

Monte Carlo Standard Error

Always report MCSE for key metrics:

  • Coverage MCSE at 95%: $\sqrt{0.95 \times 0.05 / n_{sim}} \approx 0.007$ for $n_{sim}=1000$

  • Need ~2500 reps for MCSE < 0.005

R Implementation Template

#' Run simulation study #' @param scenario Parameter list for this scenario #' @param n_rep Number of replications #' @param seed Random seed run_simulation <- function(scenario, n_rep = 2000, seed = 42) { set.seed(seed)

results <- future_map(1:n_rep, function(i) { # Generate data data <- generate_data(scenario$n, scenario$params)

# Fit methods
fit1 &#x3C;- method1(data)
fit2 &#x3C;- method2(data)

# Extract estimates
tibble(
  rep = i,
  method = c("method1", "method2"),
  estimate = c(fit1$est, fit2$est),
  se = c(fit1$se, fit2$se),
  ci_lower = estimate - 1.96 * se,
  ci_upper = estimate + 1.96 * se
)

}, .options = furrr_options(seed = TRUE)) %>% bind_rows()

Summarize

results %>% group_by(method) %>% summarize( bias = mean(estimate) - scenario$true_value, emp_se = sd(estimate), avg_se = mean(se), coverage = mean(ci_lower <= scenario$true_value & ci_upper >= scenario$true_value), mse = bias^2 + emp_se^2, .groups = "drop" ) }

Results Presentation

Standard Table Format

Table X: Simulation Results (n_rep = 2000)

                Method 1                    Method 2

n Bias SE Cov MSE Bias SE Cov MSE

200 0.02 0.15 0.94 0.023 0.01 0.12 0.95 0.014 500 0.01 0.09 0.95 0.008 0.00 0.08 0.95 0.006 1000 0.00 0.06 0.95 0.004 0.00 0.05 0.95 0.003

Note: Cov = 95% CI coverage. MCSE for coverage ≈ 0.005.

Visualization Guidelines

  • Use faceted plots for multiple scenarios

  • Show confidence bands for metrics

  • Compare methods side-by-side

  • Log scale for MSE if range is large

Checkpoints and Reproducibility

Checkpointing Strategy

Save results incrementally

if (i %% 100 == 0) { saveRDS(results_so_far, file = sprintf("checkpoint_%s_rep%d.rds", scenario_id, i)) }

Reproducibility Requirements

  • Set seed explicitly

  • Record package versions (sessionInfo() )

  • Use furrr_options(seed = TRUE) for parallel

  • Save full results, not just summaries

  • Document any manual interventions

Common Pitfalls

Design Pitfalls

  • Too few replications for coverage assessment

  • Unrealistic parameter combinations

  • Missing null scenario (effect = 0)

  • No misspecification scenarios

Implementation Pitfalls

  • Not setting seeds properly in parallel

  • Ignoring convergence failures

  • Not checking for numerical issues

  • Insufficient burn-in for MCMC methods

Reporting Pitfalls

  • Missing MCSE

  • Not reporting convergence failures

  • Cherry-picking scenarios

  • Inadequate description of DGP

Key References

  • Morris et al 2019

  • Burton et al

  • White

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

methods-paper-writer

No summary provided by upstream source.

Repository SourceNeeds Review
General

numerical-methods

No summary provided by upstream source.

Repository SourceNeeds Review
General

proof-architect

No summary provided by upstream source.

Repository SourceNeeds Review