Statistical Analysis
Comprehensive statistical testing, power analysis, and experimental design for reproducible research.
When to Use
-
Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
-
Performing regression or correlation analyses
-
Running Bayesian statistical analyses
-
Checking statistical assumptions and diagnostics
-
Calculating effect sizes and conducting power analyses
-
Reporting statistical results in APA format
-
Planning experiments with proper power calculations
-
Helping with the ANALYSIS phase of a research project
Workflow Decision Tree
START │ ├─ Need to SELECT a statistical test? │ └─ See "Test Selection Guide" │ ├─ Ready to check ASSUMPTIONS? │ └─ See "Assumption Checking" │ ├─ Ready to run ANALYSIS? │ └─ See "Running Statistical Tests" │ └─ Need to REPORT results? └─ See "Reporting Results (APA)"
Test Selection Guide
Quick Reference: Choosing the Right Test
Comparing Two Groups:
Data Type Distribution Design Test
Continuous Normal Independent Independent t-test
Continuous Non-normal Independent Mann-Whitney U
Continuous Normal Paired Paired t-test
Continuous Non-normal Paired Wilcoxon signed-rank
Binary
Chi-square / Fisher's exact
Comparing 3+ Groups:
Data Type Distribution Design Test
Continuous Normal Independent One-way ANOVA
Continuous Non-normal Independent Kruskal-Wallis
Continuous Normal Paired Repeated measures ANOVA
Continuous Non-normal Paired Friedman test
Relationships:
Analysis Use Case Test
Two continuous vars Normal Pearson correlation
Two continuous vars Non-normal Spearman correlation
Continuous outcome + predictor(s) Prediction Linear regression
Binary outcome + predictor(s) Classification Logistic regression
Assumption Checking
ALWAYS check assumptions before interpreting test results.
Key Assumptions to Check
import scipy.stats as stats import numpy as np
1. Normality Test (Shapiro-Wilk)
stat, p = stats.shapiro(data) print(f"Shapiro-Wilk: W={stat:.3f}, p={p:.3f}") if p < 0.05: print("⚠️ Normality assumption violated - consider non-parametric test")
2. Homogeneity of Variance (Levene's test)
stat, p = stats.levene(group1, group2) print(f"Levene's: F={stat:.3f}, p={p:.3f}") if p < 0.05: print("⚠️ Variance assumption violated - use Welch's t-test")
3. Outlier Detection (IQR method)
Q1, Q3 = np.percentile(data, [25, 75]) IQR = Q3 - Q1 outliers = data[(data < Q1 - 1.5IQR) | (data > Q3 + 1.5IQR)] print(f"Outliers detected: {len(outliers)}")
What to Do When Assumptions Are Violated
Assumption Violation Solution
Normality (mild, n>30) Proceed Parametric tests are robust
Normality (severe) Transform Use log/sqrt or non-parametric
Homogeneity of variance t-test Use Welch's t-test
Homogeneity of variance ANOVA Use Welch's ANOVA
Linearity (regression) Violated Add polynomial terms or use GAM
Running Statistical Tests
Python Libraries
import scipy.stats as stats # Core statistical tests import statsmodels.api as sm # Regression, diagnostics import pingouin as pg # User-friendly testing import numpy as np import pandas as pd
Common Analyses
T-Test with Complete Reporting
import pingouin as pg
Independent t-test with effect size
result = pg.ttest(group_a, group_b, correction='auto') print(f"t({result['dof'].values[0]:.0f}) = {result['T'].values[0]:.2f}, " f"p = {result['p-val'].values[0]:.3f}, " f"d = {result['cohen-d'].values[0]:.2f}")
One-Way ANOVA with Post-Hoc
import pingouin as pg
ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True) print(f"F = {aov['F'].values[0]:.2f}, p = {aov['p-unc'].values[0]:.3f}, " f"η²_p = {aov['np2'].values[0]:.3f}")
Post-hoc if significant
if aov['p-unc'].values[0] < 0.05: posthoc = pg.pairwise_tukey(dv='score', between='group', data=df) print(posthoc[['A', 'B', 'diff', 'p-tukey']])
Linear Regression with Diagnostics
import statsmodels.api as sm
Fit model
X = sm.add_constant(predictors) model = sm.OLS(outcome, X).fit() print(model.summary())
Key outputs
print(f"R² = {model.rsquared:.3f}, Adjusted R² = {model.rsquared_adj:.3f}") print(f"F({model.df_model:.0f}, {model.df_resid:.0f}) = {model.fvalue:.2f}, p = {model.f_pvalue:.4f}")
Correlation with Confidence Intervals
import pingouin as pg
Pearson correlation with CI
result = pg.corr(x, y, method='pearson') print(f"r = {result['r'].values[0]:.3f}, " f"p = {result['p-val'].values[0]:.3f}, " f"95% CI [{result['CI95%'].values[0][0]:.3f}, {result['CI95%'].values[0][1]:.3f}]")
Effect Sizes
Always report effect sizes alongside p-values.
Quick Reference: Effect Size Benchmarks
Test Effect Size Small Medium Large
T-test Cohen's d 0.20 0.50 0.80
ANOVA η²_p (partial eta²) 0.01 0.06 0.14
Correlation r 0.10 0.30 0.50
Regression R² 0.02 0.13 0.26
Chi-square Cramér's V 0.07 0.21 0.35
Important: These are guidelines only. Practical significance depends on context.
Power Analysis
A Priori Power Analysis (Before Study)
from statsmodels.stats.power import tt_ind_solve_power, FTestAnovaPower
T-test: Required n for d=0.5, power=0.80, alpha=0.05
n = tt_ind_solve_power(effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0) print(f"Required n per group: {n:.0f}")
ANOVA: Required n for f=0.25, 3 groups
power_anova = FTestAnovaPower() n = power_anova.solve_power(effect_size=0.25, ngroups=3, alpha=0.05, power=0.80) print(f"Required n per group: {n:.0f}")
Sensitivity Analysis (After Study)
What effect could we detect with n=50 per group?
detectable_d = tt_ind_solve_power(effect_size=None, nobs1=50, alpha=0.05, power=0.80, ratio=1.0) print(f"Minimum detectable effect: d = {detectable_d:.2f}")
Reporting Results (APA Format)
Templates for Common Tests
Independent T-Test:
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77, 95% CI [0.36, 1.18].
One-Way ANOVA:
A one-way ANOVA revealed a significant main effect of treatment on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc comparisons using Tukey's HSD indicated that Condition A (M = 78.2, SD = 7.3) differed significantly from Condition B (M = 71.5, SD = 8.1, p = .002).
Pearson Correlation:
There was a significant positive correlation between study hours and exam scores, r(98) = .45, p < .001, 95% CI [.28, .59].
Multiple Regression:
Multiple regression was conducted with exam scores as the outcome. The model was significant, F(3, 146) = 45.2, p < .001, R² = .48. Study hours (β = .35, p < .001) and prior GPA (β = .28, p < .001) were significant predictors.
Integration with RA Workflow
During PLANNING Phase
-
Help determine appropriate sample sizes with power analysis
-
Suggest statistical approaches for research design
During ANALYSIS Phase
-
Run assumption checks on collected data
-
Perform planned statistical analyses
-
Generate effect sizes and confidence intervals
During WRITING Phase
-
Format results for methods and results sections
-
Generate APA-formatted statistical reports
-
Connect to /write_methods and /write_results skills
Essential Reporting Elements
Always include:
-
Descriptive statistics: M, SD, n for all groups
-
Test statistics: Name, statistic value, df, exact p-value
-
Effect sizes: With confidence intervals when possible
-
Assumption checks: What was tested, results, any corrections
-
All planned analyses: Including non-significant findings