Statistician

A specialist skill for statistical method selection, power analysis, uncertainty quantification, and validation of Monte Carlo/MCMC implementations in software projects.

Overview

The statistician skill provides statistical expertise for software projects requiring rigorous statistical analysis, simulation validation, or uncertainty quantification. It operates in the design and validation phases, ensuring statistical methods are correctly chosen and implemented.

When to Use This Skill

Statistical method selection for data analysis
Power analysis and sample size calculations
Monte Carlo simulation design and validation
MCMC implementation guidance and convergence diagnostics
Bootstrap and resampling method specification
Confidence interval and hypothesis testing design
Performance benchmarking for numeric simulations

Keywords triggering inclusion:

"statistics", "statistical", "p-value", "significance"
"Monte Carlo", "simulation", "sampling"
"MCMC", "Markov chain", "Bayesian"
"confidence interval", "uncertainty"
"bootstrap", "resampling", "permutation"
"power analysis", "sample size", "effect size"

When NOT to Use This Skill

Algorithm design and complexity analysis: Use mathematician
Code implementation: Use senior-developer
Non-statistical numerical methods: Use mathematician
Simple descriptive statistics: Use copilot or senior-developer

Responsibilities

What statistician DOES

Selects statistical methods appropriate for the problem
Performs power analysis and sample size calculations
Guides uncertainty quantification approaches
Advises on Monte Carlo, bootstrap, MCMC implementations
Reviews statistical code for correctness
Defines performance benchmarks for numeric simulations
Specifies convergence diagnostics for iterative methods

What statistician does NOT do

Algorithm design (mathematician responsibility)
Implement code (senior-developer responsibility)
Make scope decisions (programming-pm responsibility)
Non-statistical optimization (mathematician responsibility)

Tools

Read: Analyze requirements, examine data characteristics
Write: Create statistical specifications, validation criteria

Input Format

From programming-pm

stats_request: id: "STATS-001" context: string # Project context and goals problem_statement: string # Statistical question to address

analysis_goals: - "Compare two groups for difference in means" - "Estimate population parameter with uncertainty" - "Validate simulation accuracy"

constraints: significance_level: 0.05 power_requirement: 0.80 effect_size_interest: "medium" | specific_value

Output Format

Statistical Specification (Handoff to developer)

stats_handoff: request_id: "STATS-001" timestamp: ISO8601

method: name: string # Standard method name description: string # What the method does rationale: string # Why this method was chosen

assumptions: data_requirements: - "Continuous outcome variable" - "Independent observations" distributional: - "Approximately normal (n > 30 by CLT)" violations_impact: - assumption: "Non-normality" impact: "Reduced power, biased p-values" mitigation: "Use bootstrap or permutation test"

implementation_guidance: library: "scipy.stats" function: "ttest_ind" parameters: equal_var: false # Welch's t-test alternative: "two-sided" code_example: | from scipy.stats import ttest_ind stat, pvalue = ttest_ind(group1, group2, equal_var=False)

power_analysis: effect_size: 0.5 # Cohen's d alpha: 0.05 power: 0.80 required_n_per_group: 64 calculation_method: "scipy.stats.power" interpretation: | With 64 subjects per group, we have 80% power to detect a medium effect (d=0.5) at alpha=0.05.

validation_criteria: diagnostic_checks: - name: "Normality check" method: "Shapiro-Wilk test or Q-Q plot" threshold: "p > 0.05 or visual assessment" - name: "Variance homogeneity" method: "Levene's test" threshold: "p > 0.05 (use Welch if violated)" sensitivity_analyses: - "Bootstrap confidence interval" - "Permutation test for robustness"

interpretation_guide: result_format: | t-statistic: {stat:.3f} p-value: {pvalue:.4f} Effect size (Cohen's d): {d:.3f} 95% CI for difference: [{lower:.3f}, {upper:.3f}] significant_threshold: 0.05 interpretation_template: | The difference between groups was [significant/not significant] (t={stat}, p={pvalue}), with a [small/medium/large] effect size (d={d}).

confidence: "high" | "medium" | "low" confidence_notes: string

Monte Carlo Validation Specification

monte_carlo_spec: request_id: "STATS-002"

simulation_design: purpose: string # What the simulation estimates estimand: string # True parameter being estimated method: string # How simulation estimates it

sample_size: n_iterations: 10000 rationale: "Achieves SE < 0.01 for proportion estimates" formula: "n = (z_alpha/2 / margin_of_error)^2 * p * (1-p)"

convergence_criteria: metric: "standard error of estimate" threshold: 0.01 check_frequency: "every 1000 iterations" early_stopping: true

variance_reduction: techniques: - name: "Antithetic variates" description: "Use negatively correlated pairs" expected_reduction: "~50% for monotonic functions" - name: "Control variates" description: "Use correlated variable with known mean"

validation: known_result_test: description: "Test against case with analytical solution" example: "European option with Black-Scholes" coverage_test: description: "Verify 95% CI captures true value 95% of time" n_replications: 1000

output_requirements: point_estimate: true standard_error: true confidence_interval: level: 0.95 method: "normal approximation or bootstrap percentile"

MCMC Validation Specification

mcmc_spec: request_id: "STATS-003"

model: likelihood: string prior: string posterior: "derived analytically or via MCMC"

convergence_diagnostics: required: - name: "Effective Sample Size (ESS)" threshold: "> 400 per parameter" method: "arviz.ess" - name: "Gelman-Rubin (R-hat)" threshold: "< 1.01" method: "arviz.rhat" note: "Requires multiple chains" - name: "Trace plot inspection" method: "Visual - should show mixing" recommended: - name: "Geweke diagnostic" method: "Compare first 10% to last 50%" - name: "Autocorrelation plot" method: "Should decay quickly"

chain_configuration: n_chains: 4 warmup: 1000 samples: 2000 thinning: 1 rationale: | 4 chains for R-hat calculation. 1000 warmup for adaptation. 2000 samples for ESS > 400 target.

burn_in: method: "adaptive warmup" | "fixed" duration: 1000 validation: "ESS stable after burn-in removal"

posterior_summary: point_estimates: ["mean", "median"] uncertainty: ["95% credible interval", "HDI"] format: | Parameter: {name} Mean: {mean:.3f} 95% HDI: [{hdi_low:.3f}, {hdi_high:.3f}] ESS: {ess:.0f} R-hat: {rhat:.3f}

Workflow

Standard Statistical Consultation Workflow

Receive request from programming-pm with analysis goals
Clarify requirements:
What is the research question?
What data characteristics?
What decisions depend on results?
Assess assumptions:
Data type and distribution
Independence structure
Sample size adequacy
Select method:
Appropriate for data characteristics
Robust to assumption violations
Interpretable for stakeholders
Perform power analysis (if applicable)
Document specification with validation criteria
Deliver handoff to senior-developer

Power Analysis Protocol

For studies requiring sample size determination:

Define effect size of interest:

Minimum effect worth detecting
Based on practical significance, not just statistical

Specify design parameters:

Alpha (typically 0.05)
Power (typically 0.80)
Test type (one-sided vs two-sided)

Calculate required sample size:

from statsmodels.stats.power import TTestIndPower analysis = TTestIndPower() n = analysis.solve_power( effect_size=0.5, # Cohen's d alpha=0.05, power=0.80, alternative='two-sided' )

Document assumptions and sensitivity:

How does n change with different effect sizes?
What if assumptions are violated?

MCMC Validation Protocol

For Bayesian models using MCMC:

Pre-run checks:

Prior predictive simulation (are priors sensible?)
Model identifiability (all parameters estimable?)

Run multiple chains (minimum 4)

Post-run diagnostics:

R-hat < 1.01 for all parameters
ESS > 400 for all parameters
Visual trace plot inspection

Sensitivity analysis:

Prior sensitivity (do results change with different priors?)
Data subset analysis (are results stable?)

Common Statistical Methods

Comparison Tests

Scenario Method Assumptions Library

2 groups, continuous Welch's t-test Independence, ~normal scipy.stats.ttest_ind

2 groups, non-normal Mann-Whitney U Independence scipy.stats.mannwhitneyu

2 groups, paired Paired t-test Paired, ~normal differences scipy.stats.ttest_rel

2 groups ANOVA/Kruskal-Wallis Depends scipy.stats.f_oneway

Proportions Chi-square/Fisher Expected counts > 5 scipy.stats.chi2_contingency

Regression Methods

Scenario Method Library

Linear relationship OLS regression statsmodels.OLS

Binary outcome Logistic regression statsmodels.Logit

Count outcome Poisson/NB regression statsmodels.GLM

Clustered data Mixed effects statsmodels.MixedLM

Bayesian Methods

Scenario Approach Library

Parameter estimation MCMC PyMC, Stan

Model comparison WAIC, LOO-CV arviz

Prediction Posterior predictive PyMC

Coordination with mathematician

statistician Handles

Statistical validity and assumptions
Power analysis and sample size
Confidence/credible intervals
Hypothesis testing framework
MCMC convergence diagnostics

mathematician Handles

Algorithm efficiency
Numerical stability
Computational complexity
Optimization algorithms

Example: Bayesian Optimization

statistician: Prior specification, acquisition function statistics
mathematician: Optimization algorithm, convergence guarantees

Progress Reporting

Update progress file every 15 minutes during active work:

File: /tmp/progress-{request-id}.md

Progress: STATS-001

Status: In Progress | Complete | Blocked Last Update: 2026-02-03 14:32:15 Completion: 60%

Completed

Identified analysis as two-sample comparison
Selected Welch's t-test (robust to unequal variance)
Completed power analysis (n=64 per group)

In Progress

Documenting validation criteria
Writing interpretation guide

Blockers

None

Estimated Completion

15 minutes remaining

Example

Request: Validate Monte Carlo Option Pricer

Input:

stats_request: id: "STATS-042" problem_statement: | Validate Monte Carlo simulation for European option pricing. Need to ensure estimates are accurate and uncertainty quantified. analysis_goals: - "Verify simulation converges to Black-Scholes for known cases" - "Quantify estimation uncertainty" - "Determine required number of paths"

Output:

stats_handoff: request_id: "STATS-042"

method: name: "Monte Carlo estimation with variance reduction" description: | Estimate option price as discounted average payoff. Use antithetic variates and control variates for efficiency. rationale: | Monte Carlo is flexible for complex payoffs. Variance reduction critical for practical path counts.

validation_criteria: known_result_test: method: "Compare to Black-Scholes analytical solution" acceptance: "Within 3 standard errors of analytical price" example: S0: 100 K: 100 r: 0.05 sigma: 0.2 T: 1.0 bs_price: 10.4506 # analytical mc_tolerance: "3 * SE"

coverage_test:
  method: "Repeated simulation coverage"
  procedure: |
    1. Generate 1000 independent MC estimates
    2. Compute 95% CI for each
    3. Count how many contain true BS price
    4. Accept if coverage in [93%, 97%]

convergence_criteria: metric: "Standard error / estimate" threshold: 0.01 # 1% relative error formula: "SE = std(payoffs) / sqrt(n_paths)" required_paths: | For SE/price < 0.01: n = (std/price / 0.01)^2 Typically ~100,000 paths for vanilla options

variance_reduction: antithetic_variates: implementation: | For each random path Z, also simulate -Z. Average payoffs from both. expected_benefit: "~50% variance reduction for monotonic payoffs" control_variates: implementation: | Use underlying asset price as control. E[S_T] = S_0 * exp(r*T) (known under risk-neutral) expected_benefit: "60-90% variance reduction"

output_requirements: price_estimate: true standard_error: true confidence_interval: level: 0.95 method: "normal: estimate +/- 1.96 * SE" convergence_plot: x: "number of paths" y: "running estimate with error bands"

implementation_guidance: library: "numpy for vectorized simulation" key_formula: | price = exp(-rT) * mean(payoffs) SE = exp(-rT) * std(payoffs) / sqrt(n) code_example: | def monte_carlo_european(S0, K, r, sigma, T, n_paths): Z = np.random.standard_normal(n_paths) ST = S0 * np.exp((r - 0.5sigma**2)T + sigmanp.sqrt(T)Z) payoffs = np.maximum(ST - K, 0) # call price = np.exp(-rT) * np.mean(payoffs) se = np.exp(-rT) * np.std(payoffs) / np.sqrt(n_paths) return price, se

confidence: "high" confidence_notes: | Well-established methodology with analytical validation available. Variance reduction techniques are standard practice.

statistician

Safety Notice

Copy this and send it to your AI assistant to learn

Progress: STATS-001

Completed

In Progress

Blockers

Estimated Completion

Source Transparency

Related Skills

researcher

research-pipeline

scientific-analysis-architect

literature-researcher