data-analysis

Generate statistical analysis code with 4-round review. Select appropriate statistical tests, interpret results, and produce analysis reports with p-values, effect sizes, and confidence intervals. Use when analyzing experimental data for a paper.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-analysis" with this command: npx skills add lingzhi227/agent-research-skills/lingzhi227-agent-research-skills-data-analysis

Data Analysis

Generate rigorous statistical analysis code with multi-round review.

Input

  • $0 — Data source (CSV, JSON, pickle, or experiment logs)
  • $1 — Research goal or hypothesis to test

References

  • 4-round code review prompts: ~/.claude/skills/data-analysis/references/review-prompts.md

Scripts

Statistical summary and comparison

python ~/.claude/skills/data-analysis/scripts/stat_summary.py --input results.csv --compare method --metric accuracy --output summary.json
python ~/.claude/skills/data-analysis/scripts/stat_summary.py --input results.csv --describe

Detects data types, recommends tests, runs comparisons, outputs effect sizes and significance stars. Requires numpy, scipy.

Format p-values

python ~/.claude/skills/data-analysis/scripts/format_pvalue.py --values "0.001 0.05 0.23" --format stars
python ~/.claude/skills/data-analysis/scripts/format_pvalue.py --csv results.csv --column pvalue --format latex

Formats p-values with stars, LaTeX notation, or plain text. Stdlib-only.

Workflow

Step 1: Generate Analysis Code

Structure the code with these sections:

  1. # IMPORT — pandas, numpy, scipy, statsmodels, sklearn
  2. # LOAD DATA — Load from original data files
  3. # DATASET PREPARATIONS — Missing values, units, exclusion criteria
  4. # DESCRIPTIVE STATISTICS — Summary tables if needed
  5. # PREPROCESSING — Dummy variables, normalization
  6. # ANALYSIS — Statistical tests per hypothesis
  7. # SAVE ADDITIONAL RESULTS — Extra results to pickle

Step 2: 4-Round Code Review

  1. Round 1 — Code Flaws: Mathematical/statistical errors, wrong calculations, trivial tests
  2. Round 2 — Data Handling: Missing values, units, preprocessing, test choice
  3. Round 3 — Per-Table: Sensible values, measures of uncertainty, missing data
  4. Round 4 — Cross-Table: Completeness, consistency, missing variables

Step 3: Produce Results

  • Every nominal value must have uncertainty (CI, STD, or p-value)
  • Statistical tests must be appropriate for the data type
  • Results must match actual data — never hallucinate

Allowed Packages

pandas, numpy, scipy, statsmodels, sklearn, pickle

Statistical Test Selection

Data TypeTest
Two groups, normalIndependent t-test
Two groups, non-normalMann-Whitney U
Paired samplesPaired t-test / Wilcoxon
Multiple groupsANOVA / Kruskal-Wallis
CategoricalChi-square / Fisher's exact
CorrelationPearson / Spearman
RegressionOLS / Logistic / Mixed effects

Rules

  • Always report p-values for statistical tests
  • Account for relevant confounding variables
  • Use inherent package functionality (e.g., formula = "y ~ a * b" for interactions)
  • Do not manually implement available statistical functions
  • Access dataframes using string-based column names, not integer indices

Related Skills

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

paper-to-code

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code-debugging

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

experiment-code

No summary provided by upstream source.

Repository SourceNeeds Review
Research

literature-review

No summary provided by upstream source.

Repository SourceNeeds Review