Statistical Analysis (Universal)

Overview

This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (scipy, statsmodels, numpy) and executes locally in your environment, making it compatible with ALL LLM providers including GPT, Gemini, Claude, DeepSeek, and Qwen.

When to Use This Skill

Compare means between groups (t-tests, ANOVA)
Test for correlations between variables
Perform hypothesis testing with p-value calculation
Apply multiple testing corrections (FDR, Bonferroni)
Calculate statistical summaries and confidence intervals
Test for normality and distribution fitting
Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)

How to Use

Step 1: Import Required Libraries

import numpy as np import pandas as pd from scipy import stats from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr from scipy.stats import f_oneway, kruskal, chi2_contingency from statsmodels.stats.multitest import multipletests from statsmodels.stats.proportion import proportions_ztest import warnings warnings.filterwarnings('ignore')

Step 2: Two-Sample t-Test

Compare means between two groups

group1, group2: arrays of numeric values

Perform independent t-test

t_statistic, p_value = ttest_ind(group1, group2)

print(f"t-statistic: {t_statistic:.4f}") print(f"p-value: {p_value:.4e}")

if p_value < 0.05: print("✅ Significant difference between groups (p < 0.05)") else: print("❌ No significant difference (p >= 0.05)")

With equal variance assumption check

Levene's test for equal variances

_, levene_p = stats.levene(group1, group2) if levene_p < 0.05: # Use Welch's t-test (unequal variances) t_stat, p_val = ttest_ind(group1, group2, equal_var=False) print(f"Welch's t-test p-value: {p_val:.4e}") else: print("Equal variances assumed")

Step 3: One-Way ANOVA

Compare means across multiple groups

groups: list of arrays, e.g., [group1, group2, group3]

Perform one-way ANOVA

f_statistic, p_value = f_oneway(*groups)

print(f"F-statistic: {f_statistic:.4f}") print(f"p-value: {p_value:.4e}")

if p_value < 0.05: print("✅ Significant difference between groups (p < 0.05)") print("Note: Use post-hoc tests to identify which groups differ") else: print("❌ No significant difference between groups")

Post-hoc pairwise t-tests with Bonferroni correction

from itertools import combinations

group_names = ['Group A', 'Group B', 'Group C'] pairwise_results = []

for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2): _, p = ttest_ind(data1, data2) pairwise_results.append({ 'comparison': f'{name1} vs {name2}', 'p_value': p })

Apply Bonferroni correction

pairwise_df = pd.DataFrame(pairwise_results) n_tests = len(pairwise_df) pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)

print("\nPairwise Comparisons (Bonferroni-corrected):") print(pairwise_df)

Step 4: Correlation Analysis

Pearson correlation (linear relationships)

r_pearson, p_pearson = pearsonr(variable1, variable2)

print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")

Spearman correlation (monotonic relationships, robust to outliers)

r_spearman, p_spearman = spearmanr(variable1, variable2)

print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")

Interpretation

if abs(r_pearson) < 0.3: strength = "weak" elif abs(r_pearson) < 0.7: strength = "moderate" else: strength = "strong"

direction = "positive" if r_pearson > 0 else "negative" print(f"Interpretation: {strength} {direction} correlation")

if p_pearson < 0.05: print("✅ Statistically significant (p < 0.05)") else: print("❌ Not statistically significant")

Step 5: Multiple Testing Correction

Scenario: Testing 1000 genes for differential expression

p_values: array of p-values from individual tests

Method 1: Benjamini-Hochberg FDR correction (recommended)

reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')

Method 2: Bonferroni correction (more conservative)

reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')

Create results DataFrame

results_df = pd.DataFrame({ 'gene': gene_names, 'p_value': p_values, 'q_value_fdr': p_adjusted_fdr, 'p_adjusted_bonferroni': p_adjusted_bonf, 'significant_fdr': reject_fdr, 'significant_bonf': reject_bonf })

Summary

print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}") print(f"Significant after FDR correction: {reject_fdr.sum()}") print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")

Save results

results_df.to_csv('statistical_results.csv', index=False) print("✅ Results saved to: statistical_results.csv")

Step 6: Non-Parametric Tests

Use when data is not normally distributed

Mann-Whitney U test (alternative to t-test)

u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')

print(f"Mann-Whitney U test:") print(f"U-statistic: {u_statistic:.4f}") print(f"p-value: {p_value_mw:.4e}")

Kruskal-Wallis H test (alternative to ANOVA)

h_statistic, p_value_kw = kruskal(*groups)

print(f"\nKruskal-Wallis H test:") print(f"H-statistic: {h_statistic:.4f}") print(f"p-value: {p_value_kw:.4e}")

Advanced Features

Normality Testing

from scipy.stats import shapiro, normaltest, kstest

Test if data follows normal distribution

Shapiro-Wilk test (best for n < 5000)

stat_sw, p_sw = shapiro(data) print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")

D'Agostino-Pearson test

stat_dp, p_dp = normaltest(data) print(f"D'Agostino-Pearson test: stat={stat_dp:.4f}, p={p_dp:.4e}")

Interpretation

if p_sw < 0.05: print("❌ Data does NOT follow normal distribution (p < 0.05)") print("→ Recommendation: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis)") else: print("✅ Data appears normally distributed (p >= 0.05)") print("→ OK to use parametric tests (t-test, ANOVA)")

Chi-Square Test for Contingency Tables

Test independence between categorical variables

contingency_table: 2D array (rows=categories1, columns=categories2)

Example: Cell type distribution across conditions

contingency_table = np.array([ [50, 30, 20], # Condition A: T cells, B cells, NK cells [40, 45, 15], # Condition B [35, 25, 40] # Condition C ])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}") print(f"p-value: {p_value:.4e}") print(f"Degrees of freedom: {dof}") print(f"\nExpected frequencies:\n{expected}")

if p_value < 0.05: print("✅ Significant association between variables (p < 0.05)") else: print("❌ No significant association")

Confidence Intervals

from scipy.stats import t as t_dist

def calculate_confidence_interval(data, confidence=0.95): """Calculate confidence interval for mean""" n = len(data) mean = np.mean(data) std_err = stats.sem(data) # Standard error of mean

# t-distribution critical value
t_crit = t_dist.ppf((1 + confidence) / 2, df=n-1)

margin_error = t_crit * std_err
ci_lower = mean - margin_error
ci_upper = mean + margin_error

return mean, ci_lower, ci_upper

Usage

mean, ci_low, ci_high = calculate_confidence_interval(data, confidence=0.95)

print(f"Mean: {mean:.4f}") print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")

Effect Size Calculation

def cohens_d(group1, group2): """Calculate Cohen's d effect size""" n1, n2 = len(group1), len(group2) var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

# Pooled standard deviation
pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

# Cohen's d
d = (np.mean(group1) - np.mean(group2)) / pooled_std

return d

Usage

effect_size = cohens_d(group1, group2) print(f"Cohen's d: {effect_size:.4f}")

Interpretation

if abs(effect_size) < 0.2: print("Effect size: negligible") elif abs(effect_size) < 0.5: print("Effect size: small") elif abs(effect_size) < 0.8: print("Effect size: medium") else: print("Effect size: large")

Common Use Cases

Differential Gene Expression Statistical Testing

Compare gene expression between two conditions

gene_expression_df: rows=genes, columns=samples

condition_labels: array indicating which condition each sample belongs to

results = []

for gene in gene_expression_df.index: # Get expression values for each condition cond1_expr = gene_expression_df.loc[gene, condition_labels == 'Condition1'] cond2_expr = gene_expression_df.loc[gene, condition_labels == 'Condition2']

# t-test
t_stat, p_val = ttest_ind(cond1_expr, cond2_expr)

# Log2 fold change
log2fc = np.log2(cond2_expr.mean() / cond1_expr.mean())

results.append({
    'gene': gene,
    'log2FC': log2fc,
    'p_value': p_val,
    'mean_cond1': cond1_expr.mean(),
    'mean_cond2': cond2_expr.mean()
})

deg_results = pd.DataFrame(results)

Apply FDR correction

_, deg_results['q_value'], _, _ = multipletests( deg_results['p_value'], alpha=0.05, method='fdr_bh' )

Filter significant genes

significant_genes = deg_results[ (deg_results['q_value'] < 0.05) & (abs(deg_results['log2FC']) > 1) ]

print(f"✅ Identified {len(significant_genes)} differentially expressed genes") print(f" - Upregulated: {(significant_genes['log2FC'] > 1).sum()}") print(f" - Downregulated: {(significant_genes['log2FC'] < -1).sum()}")

Save

significant_genes.to_csv('deg_results.csv', index=False)

Cluster Enrichment Analysis

Test if a cell type is enriched in a specific cluster

total_cells: total number of cells

cluster_cells: number of cells in cluster

celltype_total: total cells of this type

celltype_in_cluster: cells of this type in cluster

from scipy.stats import fisher_exact

Create contingency table

contingency = [ [celltype_in_cluster, cluster_cells - celltype_in_cluster], # In cluster [celltype_total - celltype_in_cluster, total_cells - cluster_cells - (celltype_total - celltype_in_cluster)] # Not in cluster ]

odds_ratio, p_value = fisher_exact(contingency, alternative='greater')

print(f"Odds ratio: {odds_ratio:.4f}") print(f"p-value: {p_value:.4e}")

if p_value < 0.05 and odds_ratio > 1: print(f"✅ Cell type is significantly ENRICHED in cluster (p < 0.05)") elif p_value < 0.05 and odds_ratio < 1: print(f"⚠️ Cell type is significantly DEPLETED in cluster (p < 0.05)") else: print("❌ No significant enrichment/depletion")

Batch Effect Detection

Test if there's a batch effect using ANOVA

gene_expression: DataFrame with genes as rows, samples as columns

batch_labels: array indicating batch for each sample

batch_effect_results = []

for gene in gene_expression.index: # Get expression values for each batch batches = [ gene_expression.loc[gene, batch_labels == batch] for batch in np.unique(batch_labels) ]

# ANOVA test
f_stat, p_val = f_oneway(*batches)

batch_effect_results.append({
    'gene': gene,
    'f_statistic': f_stat,
    'p_value': p_val
})

batch_df = pd.DataFrame(batch_effect_results)

Apply FDR correction

_, batch_df['q_value'], _, _ = multipletests(batch_df['p_value'], alpha=0.05, method='fdr_bh')

Count genes with batch effects

genes_with_batch_effect = (batch_df['q_value'] < 0.05).sum()

print(f"Genes with significant batch effect: {genes_with_batch_effect} ({genes_with_batch_effect/len(batch_df)*100:.1f}%)")

if genes_with_batch_effect > len(batch_df) * 0.1: print("⚠️ WARNING: Strong batch effect detected (>10% genes affected)") print("→ Recommendation: Apply batch correction (ComBat, Harmony, etc.)") else: print("✅ Minimal batch effect")

Best Practices

Check Assumptions: Always test normality before using parametric tests (t-test, ANOVA)
Multiple Testing: Apply FDR or Bonferroni correction when testing many hypotheses
Effect Size: Report effect sizes (Cohen's d) alongside p-values
Sample Size: Ensure adequate sample size for statistical power
Outliers: Check for and handle outliers appropriately
Non-Parametric Alternatives: Use when assumptions are violated (Mann-Whitney instead of t-test)
Report Details: Always report test used, test statistic, p-value, and correction method
Visualization: Combine statistical tests with visualizations (box plots, violin plots)

Troubleshooting

Issue: "Warning: p-value is very small"

Solution: This is normal for highly significant results. Report as p < 0.001 or use scientific notation

if p_value < 0.001: print(f"p < 0.001") else: print(f"p = {p_value:.4f}")

Issue: "Division by zero in effect size calculation"

Solution: Check for zero variance (all values identical)

if np.std(group1) == 0 or np.std(group2) == 0: print("Cannot calculate effect size: zero variance in one or both groups") else: d = cohens_d(group1, group2)

Issue: "Test fails with NaN values"

Solution: Remove or impute NaN values before testing

Remove NaN

group1_clean = group1[~np.isnan(group1)] group2_clean = group2[~np.isnan(group2)]

Or filter in DataFrame

df_clean = df.dropna(subset=['column_name'])

Issue: "Insufficient sample size warning"

Solution: Minimum sample sizes for reliable tests:

t-test: n ≥ 30 per group (or ≥ 5 if normally distributed)
ANOVA: n ≥ 20 per group
Correlation: n ≥ 30 total

if len(group1) < 30 or len(group2) < 30: print("⚠️ Warning: Small sample size. Results may not be reliable.") print("Consider using non-parametric tests or collecting more data.")

Technical Notes

Libraries: Uses scipy.stats and statsmodels (widely supported, stable)
Execution: Runs locally in the agent's sandbox
Compatibility: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
Performance: Most tests complete in milliseconds; large-scale testing (>10K genes) takes 1-5 seconds
Precision: Uses double-precision floating point (numpy default)
Corrections: FDR (Benjamini-Hochberg) recommended for genomics; Bonferroni for small numbers of tests

References

scipy.stats documentation: https://docs.scipy.org/doc/scipy/reference/stats.html
statsmodels documentation: https://www.statsmodels.org/stable/index.html
Multiple testing: https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
Statistical testing guide: https://docs.scipy.org/doc/scipy/tutorial/stats.html

data-stats-analysis

Safety Notice

Copy this and send it to your AI assistant to learn

Compare means between two groups

group1, group2: arrays of numeric values

Perform independent t-test

With equal variance assumption check

Levene's test for equal variances

Compare means across multiple groups

groups: list of arrays, e.g., [group1, group2, group3]

Perform one-way ANOVA

Post-hoc pairwise t-tests with Bonferroni correction

Apply Bonferroni correction

Pearson correlation (linear relationships)

Spearman correlation (monotonic relationships, robust to outliers)

Interpretation

Scenario: Testing 1000 genes for differential expression

p_values: array of p-values from individual tests

Method 1: Benjamini-Hochberg FDR correction (recommended)

Method 2: Bonferroni correction (more conservative)

Create results DataFrame

Summary

Save results

Use when data is not normally distributed

Mann-Whitney U test (alternative to t-test)

Kruskal-Wallis H test (alternative to ANOVA)

Test if data follows normal distribution

Shapiro-Wilk test (best for n < 5000)

D'Agostino-Pearson test

Interpretation

Test independence between categorical variables

contingency_table: 2D array (rows=categories1, columns=categories2)

Example: Cell type distribution across conditions

Usage

Usage

Interpretation

Compare gene expression between two conditions

gene_expression_df: rows=genes, columns=samples

condition_labels: array indicating which condition each sample belongs to

Apply FDR correction

Filter significant genes

Save

Test if a cell type is enriched in a specific cluster

total_cells: total number of cells

cluster_cells: number of cells in cluster

celltype_total: total cells of this type

celltype_in_cluster: cells of this type in cluster

Create contingency table

Test if there's a batch effect using ANOVA

gene_expression: DataFrame with genes as rows, samples as columns

batch_labels: array indicating batch for each sample

Apply FDR correction

Count genes with batch effects

Remove NaN

Or filter in DataFrame

Source Transparency

Related Skills

bulk-rna-seq-deseq2-analysis-with-omicverse

single-cell-downstream-analysis

string-protein-interaction-analysis-with-omicverse

gsea-enrichment-analysis