Statistical Analysis

Comprehensive statistical testing, power analysis, and experimental design for reproducible research.

When to Use

Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
Performing regression or correlation analyses
Running Bayesian statistical analyses
Checking statistical assumptions and diagnostics
Calculating effect sizes and conducting power analyses
Reporting statistical results in APA format
Planning experiments with proper power calculations
Helping with the ANALYSIS phase of a research project

Workflow Decision Tree

START │ ├─ Need to SELECT a statistical test? │ └─ See "Test Selection Guide" │ ├─ Ready to check ASSUMPTIONS? │ └─ See "Assumption Checking" │ ├─ Ready to run ANALYSIS? │ └─ See "Running Statistical Tests" │ └─ Need to REPORT results? └─ See "Reporting Results (APA)"

Test Selection Guide

Quick Reference: Choosing the Right Test

Comparing Two Groups:

Data Type Distribution Design Test

Continuous Normal Independent Independent t-test

Continuous Non-normal Independent Mann-Whitney U

Continuous Normal Paired Paired t-test

Continuous Non-normal Paired Wilcoxon signed-rank

Binary

Chi-square / Fisher's exact

Comparing 3+ Groups:

Data Type Distribution Design Test

Continuous Normal Independent One-way ANOVA

Continuous Non-normal Independent Kruskal-Wallis

Continuous Normal Paired Repeated measures ANOVA

Continuous Non-normal Paired Friedman test

Relationships:

Analysis Use Case Test

Two continuous vars Normal Pearson correlation

Two continuous vars Non-normal Spearman correlation

Continuous outcome + predictor(s) Prediction Linear regression

Binary outcome + predictor(s) Classification Logistic regression

Assumption Checking

ALWAYS check assumptions before interpreting test results.

Key Assumptions to Check

import scipy.stats as stats import numpy as np

1. Normality Test (Shapiro-Wilk)

stat, p = stats.shapiro(data) print(f"Shapiro-Wilk: W={stat:.3f}, p={p:.3f}") if p < 0.05: print("⚠️ Normality assumption violated - consider non-parametric test")

2. Homogeneity of Variance (Levene's test)

stat, p = stats.levene(group1, group2) print(f"Levene's: F={stat:.3f}, p={p:.3f}") if p < 0.05: print("⚠️ Variance assumption violated - use Welch's t-test")

3. Outlier Detection (IQR method)

Q1, Q3 = np.percentile(data, [25, 75]) IQR = Q3 - Q1 outliers = data[(data < Q1 - 1.5IQR) | (data > Q3 + 1.5IQR)] print(f"Outliers detected: {len(outliers)}")

What to Do When Assumptions Are Violated

Assumption Violation Solution

Normality (mild, n>30) Proceed Parametric tests are robust

Normality (severe) Transform Use log/sqrt or non-parametric

Homogeneity of variance t-test Use Welch's t-test

Homogeneity of variance ANOVA Use Welch's ANOVA

Linearity (regression) Violated Add polynomial terms or use GAM

Running Statistical Tests

Python Libraries

import scipy.stats as stats # Core statistical tests import statsmodels.api as sm # Regression, diagnostics import pingouin as pg # User-friendly testing import numpy as np import pandas as pd

Common Analyses

T-Test with Complete Reporting

import pingouin as pg

Independent t-test with effect size

result = pg.ttest(group_a, group_b, correction='auto') print(f"t({result['dof'].values[0]:.0f}) = {result['T'].values[0]:.2f}, " f"p = {result['p-val'].values[0]:.3f}, " f"d = {result['cohen-d'].values[0]:.2f}")

One-Way ANOVA with Post-Hoc

import pingouin as pg

ANOVA

aov = pg.anova(dv='score', between='group', data=df, detailed=True) print(f"F = {aov['F'].values[0]:.2f}, p = {aov['p-unc'].values[0]:.3f}, " f"η²_p = {aov['np2'].values[0]:.3f}")

Post-hoc if significant

if aov['p-unc'].values[0] < 0.05: posthoc = pg.pairwise_tukey(dv='score', between='group', data=df) print(posthoc[['A', 'B', 'diff', 'p-tukey']])

Linear Regression with Diagnostics

import statsmodels.api as sm

Fit model

X = sm.add_constant(predictors) model = sm.OLS(outcome, X).fit() print(model.summary())

Key outputs

print(f"R² = {model.rsquared:.3f}, Adjusted R² = {model.rsquared_adj:.3f}") print(f"F({model.df_model:.0f}, {model.df_resid:.0f}) = {model.fvalue:.2f}, p = {model.f_pvalue:.4f}")

Correlation with Confidence Intervals

import pingouin as pg

Pearson correlation with CI

result = pg.corr(x, y, method='pearson') print(f"r = {result['r'].values[0]:.3f}, " f"p = {result['p-val'].values[0]:.3f}, " f"95% CI [{result['CI95%'].values[0][0]:.3f}, {result['CI95%'].values[0][1]:.3f}]")

Effect Sizes

Always report effect sizes alongside p-values.

Quick Reference: Effect Size Benchmarks

Test Effect Size Small Medium Large

T-test Cohen's d 0.20 0.50 0.80

ANOVA η²_p (partial eta²) 0.01 0.06 0.14

Correlation r 0.10 0.30 0.50

Regression R² 0.02 0.13 0.26

Chi-square Cramér's V 0.07 0.21 0.35

Important: These are guidelines only. Practical significance depends on context.

Power Analysis

A Priori Power Analysis (Before Study)

from statsmodels.stats.power import tt_ind_solve_power, FTestAnovaPower

T-test: Required n for d=0.5, power=0.80, alpha=0.05

n = tt_ind_solve_power(effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0) print(f"Required n per group: {n:.0f}")

ANOVA: Required n for f=0.25, 3 groups

power_anova = FTestAnovaPower() n = power_anova.solve_power(effect_size=0.25, ngroups=3, alpha=0.05, power=0.80) print(f"Required n per group: {n:.0f}")

Sensitivity Analysis (After Study)

What effect could we detect with n=50 per group?

detectable_d = tt_ind_solve_power(effect_size=None, nobs1=50, alpha=0.05, power=0.80, ratio=1.0) print(f"Minimum detectable effect: d = {detectable_d:.2f}")

Reporting Results (APA Format)

Templates for Common Tests

Independent T-Test:

Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77, 95% CI [0.36, 1.18].

One-Way ANOVA:

A one-way ANOVA revealed a significant main effect of treatment on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc comparisons using Tukey's HSD indicated that Condition A (M = 78.2, SD = 7.3) differed significantly from Condition B (M = 71.5, SD = 8.1, p = .002).

Pearson Correlation:

There was a significant positive correlation between study hours and exam scores, r(98) = .45, p < .001, 95% CI [.28, .59].

Multiple Regression:

Multiple regression was conducted with exam scores as the outcome. The model was significant, F(3, 146) = 45.2, p < .001, R² = .48. Study hours (β = .35, p < .001) and prior GPA (β = .28, p < .001) were significant predictors.

Integration with RA Workflow

During PLANNING Phase

Help determine appropriate sample sizes with power analysis
Suggest statistical approaches for research design

During ANALYSIS Phase

Run assumption checks on collected data
Perform planned statistical analyses
Generate effect sizes and confidence intervals

During WRITING Phase

Format results for methods and results sections
Generate APA-formatted statistical reports
Connect to /write_methods and /write_results skills

Essential Reporting Elements

Always include:

Descriptive statistics: M, SD, n for all groups
Test statistics: Name, statistic value, df, exact p-value
Effect sizes: With confidence intervals when possible
Assumption checks: What was tested, results, any corrections
All planned analyses: Including non-significant findings

statistical-analysis

Safety Notice

Copy this and send it to your AI assistant to learn

Binary

1. Normality Test (Shapiro-Wilk)

2. Homogeneity of Variance (Levene's test)

3. Outlier Detection (IQR method)

Independent t-test with effect size

ANOVA

Post-hoc if significant

Fit model

Key outputs

Pearson correlation with CI

T-test: Required n for d=0.5, power=0.80, alpha=0.05

ANOVA: Required n for f=0.25, 3 groups

What effect could we detect with n=50 per group?

Source Transparency

Related Skills

scientific-writing

deep-research

next

weekly-review