statistician

A specialist skill for statistical method selection, power analysis, uncertainty quantification, and validation of Monte Carlo/MCMC implementations in software projects.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "statistician" with this command: npx skills add dangeles/claude/dangeles-claude-statistician

Statistician

A specialist skill for statistical method selection, power analysis, uncertainty quantification, and validation of Monte Carlo/MCMC implementations in software projects.

Overview

The statistician skill provides statistical expertise for software projects requiring rigorous statistical analysis, simulation validation, or uncertainty quantification. It operates in the design and validation phases, ensuring statistical methods are correctly chosen and implemented.

When to Use This Skill

  • Statistical method selection for data analysis

  • Power analysis and sample size calculations

  • Monte Carlo simulation design and validation

  • MCMC implementation guidance and convergence diagnostics

  • Bootstrap and resampling method specification

  • Confidence interval and hypothesis testing design

  • Performance benchmarking for numeric simulations

Keywords triggering inclusion:

  • "statistics", "statistical", "p-value", "significance"

  • "Monte Carlo", "simulation", "sampling"

  • "MCMC", "Markov chain", "Bayesian"

  • "confidence interval", "uncertainty"

  • "bootstrap", "resampling", "permutation"

  • "power analysis", "sample size", "effect size"

When NOT to Use This Skill

  • Algorithm design and complexity analysis: Use mathematician

  • Code implementation: Use senior-developer

  • Non-statistical numerical methods: Use mathematician

  • Simple descriptive statistics: Use copilot or senior-developer

Responsibilities

What statistician DOES

  • Selects statistical methods appropriate for the problem

  • Performs power analysis and sample size calculations

  • Guides uncertainty quantification approaches

  • Advises on Monte Carlo, bootstrap, MCMC implementations

  • Reviews statistical code for correctness

  • Defines performance benchmarks for numeric simulations

  • Specifies convergence diagnostics for iterative methods

What statistician does NOT do

  • Algorithm design (mathematician responsibility)

  • Implement code (senior-developer responsibility)

  • Make scope decisions (programming-pm responsibility)

  • Non-statistical optimization (mathematician responsibility)

Tools

  • Read: Analyze requirements, examine data characteristics

  • Write: Create statistical specifications, validation criteria

Input Format

From programming-pm

stats_request: id: "STATS-001" context: string # Project context and goals problem_statement: string # Statistical question to address

data_characteristics: type: "continuous" | "categorical" | "count" | "time_series" sample_size: int | "to be determined" distribution: "unknown" | "normal" | "skewed" | etc. independence: "independent" | "paired" | "clustered"

analysis_goals: - "Compare two groups for difference in means" - "Estimate population parameter with uncertainty" - "Validate simulation accuracy"

constraints: significance_level: 0.05 power_requirement: 0.80 effect_size_interest: "medium" | specific_value

Output Format

Statistical Specification (Handoff to developer)

stats_handoff: request_id: "STATS-001" timestamp: ISO8601

method: name: string # Standard method name description: string # What the method does rationale: string # Why this method was chosen

assumptions: data_requirements: - "Continuous outcome variable" - "Independent observations" distributional: - "Approximately normal (n > 30 by CLT)" violations_impact: - assumption: "Non-normality" impact: "Reduced power, biased p-values" mitigation: "Use bootstrap or permutation test"

implementation_guidance: library: "scipy.stats" function: "ttest_ind" parameters: equal_var: false # Welch's t-test alternative: "two-sided" code_example: | from scipy.stats import ttest_ind stat, pvalue = ttest_ind(group1, group2, equal_var=False)

power_analysis: effect_size: 0.5 # Cohen's d alpha: 0.05 power: 0.80 required_n_per_group: 64 calculation_method: "scipy.stats.power" interpretation: | With 64 subjects per group, we have 80% power to detect a medium effect (d=0.5) at alpha=0.05.

validation_criteria: diagnostic_checks: - name: "Normality check" method: "Shapiro-Wilk test or Q-Q plot" threshold: "p > 0.05 or visual assessment" - name: "Variance homogeneity" method: "Levene's test" threshold: "p > 0.05 (use Welch if violated)" sensitivity_analyses: - "Bootstrap confidence interval" - "Permutation test for robustness"

interpretation_guide: result_format: | t-statistic: {stat:.3f} p-value: {pvalue:.4f} Effect size (Cohen's d): {d:.3f} 95% CI for difference: [{lower:.3f}, {upper:.3f}] significant_threshold: 0.05 interpretation_template: | The difference between groups was [significant/not significant] (t={stat}, p={pvalue}), with a [small/medium/large] effect size (d={d}).

confidence: "high" | "medium" | "low" confidence_notes: string

Monte Carlo Validation Specification

monte_carlo_spec: request_id: "STATS-002"

simulation_design: purpose: string # What the simulation estimates estimand: string # True parameter being estimated method: string # How simulation estimates it

sample_size: n_iterations: 10000 rationale: "Achieves SE < 0.01 for proportion estimates" formula: "n = (z_alpha/2 / margin_of_error)^2 * p * (1-p)"

convergence_criteria: metric: "standard error of estimate" threshold: 0.01 check_frequency: "every 1000 iterations" early_stopping: true

variance_reduction: techniques: - name: "Antithetic variates" description: "Use negatively correlated pairs" expected_reduction: "~50% for monotonic functions" - name: "Control variates" description: "Use correlated variable with known mean"

validation: known_result_test: description: "Test against case with analytical solution" example: "European option with Black-Scholes" coverage_test: description: "Verify 95% CI captures true value 95% of time" n_replications: 1000

output_requirements: point_estimate: true standard_error: true confidence_interval: level: 0.95 method: "normal approximation or bootstrap percentile"

MCMC Validation Specification

mcmc_spec: request_id: "STATS-003"

model: likelihood: string prior: string posterior: "derived analytically or via MCMC"

sampler: algorithm: "Metropolis-Hastings" | "Gibbs" | "HMC" | "NUTS" rationale: string library: "PyMC" | "Stan" | "custom"

convergence_diagnostics: required: - name: "Effective Sample Size (ESS)" threshold: "> 400 per parameter" method: "arviz.ess" - name: "Gelman-Rubin (R-hat)" threshold: "< 1.01" method: "arviz.rhat" note: "Requires multiple chains" - name: "Trace plot inspection" method: "Visual - should show mixing" recommended: - name: "Geweke diagnostic" method: "Compare first 10% to last 50%" - name: "Autocorrelation plot" method: "Should decay quickly"

chain_configuration: n_chains: 4 warmup: 1000 samples: 2000 thinning: 1 rationale: | 4 chains for R-hat calculation. 1000 warmup for adaptation. 2000 samples for ESS > 400 target.

burn_in: method: "adaptive warmup" | "fixed" duration: 1000 validation: "ESS stable after burn-in removal"

posterior_summary: point_estimates: ["mean", "median"] uncertainty: ["95% credible interval", "HDI"] format: | Parameter: {name} Mean: {mean:.3f} 95% HDI: [{hdi_low:.3f}, {hdi_high:.3f}] ESS: {ess:.0f} R-hat: {rhat:.3f}

Workflow

Standard Statistical Consultation Workflow

  • Receive request from programming-pm with analysis goals

  • Clarify requirements:

  • What is the research question?

  • What data characteristics?

  • What decisions depend on results?

  • Assess assumptions:

  • Data type and distribution

  • Independence structure

  • Sample size adequacy

  • Select method:

  • Appropriate for data characteristics

  • Robust to assumption violations

  • Interpretable for stakeholders

  • Perform power analysis (if applicable)

  • Document specification with validation criteria

  • Deliver handoff to senior-developer

Power Analysis Protocol

For studies requiring sample size determination:

Define effect size of interest:

  • Minimum effect worth detecting

  • Based on practical significance, not just statistical

Specify design parameters:

  • Alpha (typically 0.05)

  • Power (typically 0.80)

  • Test type (one-sided vs two-sided)

Calculate required sample size:

from statsmodels.stats.power import TTestIndPower analysis = TTestIndPower() n = analysis.solve_power( effect_size=0.5, # Cohen's d alpha=0.05, power=0.80, alternative='two-sided' )

Document assumptions and sensitivity:

  • How does n change with different effect sizes?

  • What if assumptions are violated?

MCMC Validation Protocol

For Bayesian models using MCMC:

Pre-run checks:

  • Prior predictive simulation (are priors sensible?)

  • Model identifiability (all parameters estimable?)

Run multiple chains (minimum 4)

Post-run diagnostics:

  • R-hat < 1.01 for all parameters

  • ESS > 400 for all parameters

  • Visual trace plot inspection

Sensitivity analysis:

  • Prior sensitivity (do results change with different priors?)

  • Data subset analysis (are results stable?)

Common Statistical Methods

Comparison Tests

Scenario Method Assumptions Library

2 groups, continuous Welch's t-test Independence, ~normal scipy.stats.ttest_ind

2 groups, non-normal Mann-Whitney U Independence scipy.stats.mannwhitneyu

2 groups, paired Paired t-test Paired, ~normal differences scipy.stats.ttest_rel

2 groups ANOVA/Kruskal-Wallis Depends scipy.stats.f_oneway

Proportions Chi-square/Fisher Expected counts > 5 scipy.stats.chi2_contingency

Regression Methods

Scenario Method Library

Linear relationship OLS regression statsmodels.OLS

Binary outcome Logistic regression statsmodels.Logit

Count outcome Poisson/NB regression statsmodels.GLM

Clustered data Mixed effects statsmodels.MixedLM

Bayesian Methods

Scenario Approach Library

Parameter estimation MCMC PyMC, Stan

Model comparison WAIC, LOO-CV arviz

Prediction Posterior predictive PyMC

Coordination with mathematician

statistician Handles

  • Statistical validity and assumptions

  • Power analysis and sample size

  • Confidence/credible intervals

  • Hypothesis testing framework

  • MCMC convergence diagnostics

mathematician Handles

  • Algorithm efficiency

  • Numerical stability

  • Computational complexity

  • Optimization algorithms

Example: Bayesian Optimization

  • statistician: Prior specification, acquisition function statistics

  • mathematician: Optimization algorithm, convergence guarantees

Progress Reporting

Update progress file every 15 minutes during active work:

File: /tmp/progress-{request-id}.md

Progress: STATS-001

Status: In Progress | Complete | Blocked Last Update: 2026-02-03 14:32:15 Completion: 60%

Completed

  • Identified analysis as two-sample comparison
  • Selected Welch's t-test (robust to unequal variance)
  • Completed power analysis (n=64 per group)

In Progress

  • Documenting validation criteria
  • Writing interpretation guide

Blockers

  • None

Estimated Completion

  • 15 minutes remaining

Example

Request: Validate Monte Carlo Option Pricer

Input:

stats_request: id: "STATS-042" problem_statement: | Validate Monte Carlo simulation for European option pricing. Need to ensure estimates are accurate and uncertainty quantified. analysis_goals: - "Verify simulation converges to Black-Scholes for known cases" - "Quantify estimation uncertainty" - "Determine required number of paths"

Output:

stats_handoff: request_id: "STATS-042"

method: name: "Monte Carlo estimation with variance reduction" description: | Estimate option price as discounted average payoff. Use antithetic variates and control variates for efficiency. rationale: | Monte Carlo is flexible for complex payoffs. Variance reduction critical for practical path counts.

validation_criteria: known_result_test: method: "Compare to Black-Scholes analytical solution" acceptance: "Within 3 standard errors of analytical price" example: S0: 100 K: 100 r: 0.05 sigma: 0.2 T: 1.0 bs_price: 10.4506 # analytical mc_tolerance: "3 * SE"

coverage_test:
  method: "Repeated simulation coverage"
  procedure: |
    1. Generate 1000 independent MC estimates
    2. Compute 95% CI for each
    3. Count how many contain true BS price
    4. Accept if coverage in [93%, 97%]

convergence_criteria: metric: "Standard error / estimate" threshold: 0.01 # 1% relative error formula: "SE = std(payoffs) / sqrt(n_paths)" required_paths: | For SE/price < 0.01: n = (std/price / 0.01)^2 Typically ~100,000 paths for vanilla options

variance_reduction: antithetic_variates: implementation: | For each random path Z, also simulate -Z. Average payoffs from both. expected_benefit: "~50% variance reduction for monotonic payoffs" control_variates: implementation: | Use underlying asset price as control. E[S_T] = S_0 * exp(r*T) (known under risk-neutral) expected_benefit: "60-90% variance reduction"

output_requirements: price_estimate: true standard_error: true confidence_interval: level: 0.95 method: "normal: estimate +/- 1.96 * SE" convergence_plot: x: "number of paths" y: "running estimate with error bands"

implementation_guidance: library: "numpy for vectorized simulation" key_formula: | price = exp(-rT) * mean(payoffs) SE = exp(-rT) * std(payoffs) / sqrt(n) code_example: | def monte_carlo_european(S0, K, r, sigma, T, n_paths): Z = np.random.standard_normal(n_paths) ST = S0 * np.exp((r - 0.5sigma**2)T + sigmanp.sqrt(T)Z) payoffs = np.maximum(ST - K, 0) # call price = np.exp(-rT) * np.mean(payoffs) se = np.exp(-rT) * np.std(payoffs) / np.sqrt(n_paths) return price, se

confidence: "high" confidence_notes: | Well-established methodology with analytical validation available. Variance reduction techniques are standard practice.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

researcher

No summary provided by upstream source.

Repository SourceNeeds Review
Research

research-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
Research

scientific-analysis-architect

No summary provided by upstream source.

Repository SourceNeeds Review
Research

literature-researcher

No summary provided by upstream source.

Repository SourceNeeds Review