experiment-design

Design experiment plans with progressive stages — initial implementation, baseline tuning, creative research, and ablation studies. Plan baselines, datasets, hyperparameter sweeps, and evaluation metrics. Use when planning experiments for a research paper.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "experiment-design" with this command: npx skills add lingzhi227/agent-research-skills/lingzhi227-agent-research-skills-experiment-design

Experiment Design

Design structured, progressive experiment plans for research papers.

Input

  • $0 — Research idea, plan, or method description

References

  • 4-stage progressive experiment prompts: ~/.claude/skills/experiment-design/references/stage-prompts.md

Scripts

Generate experiment design

python ~/.claude/skills/experiment-design/scripts/design_experiments.py --plan research_plan.json --output experiment_design.json
python ~/.claude/skills/experiment-design/scripts/design_experiments.py --method "contrastive learning" --task classification --format markdown

Generates baselines, ablation matrix, hyperparameter grid, metric selection. Stdlib-only.

4-Stage Progressive Framework (from AI-Scientist-v2)

Stage 1: Initial Implementation

  • Focus on getting a basic working implementation
  • Use a simple dataset
  • Aim for basic functional correctness
  • Completion: at least one working (non-buggy) implementation

Stage 2: Baseline Tuning

  • Tune hyperparameters (learning rate, epochs, batch size)
  • Do NOT change model architecture
  • Test on at least TWO datasets
  • Completion: stable training curves, improvement over Stage 1

Stage 3: Creative Research

  • Explore novel improvements and insights
  • Be creative and think outside the box
  • Test on at least THREE datasets
  • Completion: demonstrated novel improvement

Stage 4: Ablation Studies

  • Systematic component analysis
  • Each ablation tests a different aspect
  • Use same datasets as Stage 3
  • Completion: all planned ablations done

Output Format

{
  "stages": [
    {
      "name": "initial_implementation",
      "goals": ["Basic working baseline", "Simple dataset"],
      "max_iterations": 5,
      "completion_criteria": "Working implementation with non-zero accuracy"
    }
  ],
  "baselines": ["Method A", "Method B"],
  "datasets": ["Dataset1", "Dataset2", "Dataset3"],
  "metrics": ["accuracy", "F1", "inference_time"],
  "ablation_components": ["component_A", "component_B"],
  "hyperparameter_grid": {
    "lr": [1e-4, 1e-3, 1e-2],
    "batch_size": [32, 64, 128]
  },
  "num_seeds": 3
}

Rules

  • Always start simple (Stage 1) before complex experiments
  • Each stage builds on the best result from the previous stage
  • Multi-seed evaluation for statistical significance
  • Document every experiment run in notes.txt
  • Generate figures for training curves and comparisons

Related Skills

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

literature-review

No summary provided by upstream source.

Repository SourceNeeds Review
Research

deep-research

No summary provided by upstream source.

Repository SourceNeeds Review
Research

literature-search

No summary provided by upstream source.

Repository SourceNeeds Review
Research

paper-revision

No summary provided by upstream source.

Repository SourceNeeds Review