aris-autonomous-ml-research

```markdown --- name: aris-autonomous-ml-research description: Autonomous ML research workflows using ARIS (Auto-Research-In-Sleep) — Markdown-only skills for cross-model paper review, idea discovery, experiment automation, and paper writing with Claude Code, Codex, or any LLM agent. triggers: - "set up ARIS for autonomous research" - "run research pipeline while I sleep" - "automate ML paper writing with Claude Code" - "cross-model review loop for my paper" - "use ARIS to find research ideas" - "run experiment automation with ARIS" - "set up auto paper review workflow" - "write rebuttal with ARIS" ---

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "aris-autonomous-ml-research" with this command: npx skills add aradotso/trending-skills/aradotso-trending-skills-aris-autonomous-ml-research

---
name: aris-autonomous-ml-research
description: Autonomous ML research workflows using ARIS (Auto-Research-In-Sleep) — Markdown-only skills for cross-model paper review, idea discovery, experiment automation, and paper writing with Claude Code, Codex, or any LLM agent.
triggers:
  - "set up ARIS for autonomous research"
  - "run research pipeline while I sleep"
  - "automate ML paper writing with Claude Code"
  - "cross-model review loop for my paper"
  - "use ARIS to find research ideas"
  - "run experiment automation with ARIS"
  - "set up auto paper review workflow"
  - "write rebuttal with ARIS"
---

# ARIS — Auto-Research-In-Sleep

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

ARIS is a **zero-dependency, Markdown-only** autonomous ML research system. Every "skill" is a plain `SKILL.md` file that any LLM agent can read and execute. It orchestrates cross-model collaboration — one model executes research (Claude Code, Codex, etc.) while another acts as adversarial reviewer (GPT-5.4, Gemini, GLM, MiniMax, etc.) to break self-play blind spots.

**Core value**: going from research direction → paper ideas → experiments → written paper → rebuttal, autonomously, overnight.

---

## Installation

### 1. Clone the Repository

```bash
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep

No pip install, no Docker, no daemon. The entire system is Markdown files.

2. Install Claude Code (Primary Agent)

npm install -g @anthropic-ai/claude-code

3. Install Codex MCP (Cross-Model Reviewer)

npm install -g @openai/codex

Configure Claude Code to use the Codex MCP server by adding to your Claude Code config (~/.claude/settings.json):

{
  "mcpServers": {
    "codex": {
      "command": "codex",
      "args": ["mcp"],
      "env": {
        "OPENAI_API_KEY": "$OPENAI_API_KEY"
      }
    }
  }
}

4. Copy Skills into Claude Code

# Copy all skills to Claude Code's custom skills directory
cp -r skills/claude-code/ ~/.claude/skills/

# Or symlink to stay up to date
ln -s $(pwd)/skills/claude-code ~/.claude/skills/aris

5. Set Environment Variables

# Required for Claude Code
export ANTHROPIC_API_KEY=your_anthropic_key

# Required for cross-model review (GPT-5.4 as reviewer)
export OPENAI_API_KEY=your_openai_key

# Optional: alternative reviewer models (no OpenAI needed)
export LLM_REVIEWER_BASE_URL=https://api.minimax.chat/v1
export LLM_REVIEWER_API_KEY=your_minimax_key
export LLM_REVIEWER_MODEL=MiniMax-M2.7

Alternative Model Combinations (No Claude/OpenAI Required)

ARIS works with any OpenAI-compatible API. Configure the llm-chat MCP server:

{
  "mcpServers": {
    "llm-chat": {
      "command": "node",
      "args": ["mcp-servers/llm-chat/index.js"],
      "env": {
        "LLM_BASE_URL": "$LLM_REVIEWER_BASE_URL",
        "LLM_API_KEY": "$LLM_REVIEWER_API_KEY",
        "LLM_MODEL": "$LLM_REVIEWER_MODEL"
      }
    }
  }
}

Tested combinations:

ExecutorReviewerConfig
Claude CodeGPT-5.4 xhighDefault
Codex CLIGeminiGuide
Claude CodeMiniMax-M2.7LLM_BASE_URL=https://api.minimax.chat/v1
Claude CodeGLM-5LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4
MiniMax-M2.7GLM-5Guide
Codex CLIClaudeSwap executor/reviewer

Core Workflows

Workflow 0: Full Pipeline (Start Here)

/research-pipeline "factorized gap in discrete diffusion LMs"

With a reference paper and base repo:

/research-pipeline "improve method X" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project

ARIS will:

  1. Read the paper → find weaknesses
  2. Clone the codebase
  3. Generate ideas that fix those weaknesses using that code
  4. Run experiments
  5. Write the paper

Parameters:

/research-pipeline "topic"
  — ref paper: <arxiv_url>       # Optional: paper to improve
  — base repo: <github_url>      # Optional: codebase to build on
  — venue: ICML                  # Target venue (default: ICML)
  — compact: true                # Lean summaries for short-context models

Workflow 1: Idea Discovery

/idea-discovery "discrete diffusion language models"

Scans literature, identifies gaps, generates novel research directions, scores each idea for novelty/feasibility, and outputs a ranked proposal list.


Workflow 1.5: Experiment Bridge

/experiment-bridge "run ablation on temperature scaling" — code review: true

Cross-model code review before GPU deployment (enabled by default). Catches bugs, confirms experimental validity, then runs.

# Example: what experiment-bridge automates
# 1. Claude Code writes training script
# 2. GPT-5.4 reviews the code (code review gate)
# 3. If approved → submits to GPU cluster
# 4. Monitors via W&B API
import wandb

api = wandb.Api()
runs = api.runs("your-entity/your-project")
for run in runs:
    print(run.name, run.summary.get("val_loss", None))

Workflow 2: Paper Writing

/paper-writing "results/" — venue: NeurIPS

Generates LaTeX paper from experiment results. Anti-hallucination enforced: every citation verified via DBLP → CrossRef → [VERIFY] tag if unconfirmed.

Venue templates available: ICML, NeurIPS, ICLR, CVPR, ACL, AAAI, ACM MM


Workflow 3: Auto Review Loop

/auto-review "paper.pdf"

The core ARIS loop:

  1. Claude Code reads the paper
  2. GPT-5.4 reviews as adversarial critic
  3. Claude Code rewrites based on critique
  4. Score tracked across rounds (target: 8/10 "clear accept")
  5. Loop repeats until convergence or max rounds
Score progression: 5.2 → 6.1 → 7.3 → 8.0 ✓

Workflow 4: Rebuttal

/rebuttal "paper/ + reviews" — venue: ICML, character limit: 5000

Parameters:

ParameterDefaultDescription
venueICMLTarget venue
character limitrequiredHard limit for submission
quick modefalseStop after parsing + strategy (no draft)
auto experimentfalseAuto-run supplementary experiments
max stress test rounds1GPT-5.4 stress-test iterations
max followup rounds3Per-reviewer follow-up limit

Three safety gates (rebuttal won't finalize if any fails):

  • 🔒 No fabrication — every claim maps to paper/review/user-confirmed result
  • 🔒 No overpromise — every promise is user-approved
  • 🔒 Full coverage — every reviewer concern is tracked

Outputs:

  • PASTE_READY.txt — exact char count, paste directly to venue
  • REBUTTAL_DRAFT_rich.md — extended version for manual editing

Bonus: Slides and Poster

# Conference presentation
/paper-slides "paper/"     # → Beamer PDF + PPTX + speaker notes + Q&A prep

# Conference poster
/paper-poster "paper/"     # → A0/A1 poster PDF + editable PPTX + SVG

Standalone Skills

These skills can be invoked independently or are integrated into the core workflows:

SkillCommandDescription
Research Refine/research-refineTurn vague ideas into anchored proposals
Experiment Plan/experiment-planClaim-driven experiment roadmaps
Training Check/training-checkValidate training runs before full launch
Result to Claim/result-to-claimConvert raw results to paper claims
Ablation Planner/ablation-plannerDesign ablation study structure
Formula Derivation/formula-derivationResearch formula development and verification
Grant Proposal/grant-proposalWrite grant proposals from research
Paper Illustration/paper-illustrationGenerate figures (Gemini-powered)
Citation Claw/citation-clawVerify and format citations

Session Recovery & Compact Mode

For short-context models or after interruption:

/research-pipeline "topic" — compact: true

Generates lean summary files at each checkpoint. Resume after interruption:

/research-refine — resume: true

ARIS auto-checkpoints the research-refine workflow and resumes from last completed phase.


Codex CLI Native Skills

Full skill set available for OpenAI Codex without Claude Code:

cd skills/skills-codex/
codex "run idea-discovery on discrete diffusion"

MCP Server: llm-chat

The llm-chat MCP server bridges any OpenAI-compatible API as a reviewer. Start it manually for debugging:

cd mcp-servers/llm-chat/
node index.js

Environment variables:

export LLM_BASE_URL=https://api.openai.com/v1   # Any OpenAI-compatible endpoint
export LLM_API_KEY=$OPENAI_API_KEY
export LLM_MODEL=gpt-4o                          # Any model name

Free Tier via ModelScope

Zero-cost option — no API key required:

# See full guide: docs/MODELSCOPE_GUIDE.md
export MODELSCOPE_API_KEY=your_modelscope_token
export LLM_BASE_URL=https://api-inference.modelscope.cn/v1
export LLM_MODEL=Qwen/Qwen2.5-72B-Instruct

Input Templates

Templates for every workflow live in templates/:

ls templates/
# idea-discovery.md
# experiment-bridge.md
# paper-writing.md
# auto-review.md
# rebuttal.md
# research-refine.md

Use them to structure your inputs:

cat templates/rebuttal.md
# Fill in: paper path, review text, venue, character limit
# Then: /rebuttal [filled template]

Directory Structure

Auto-claude-code-research-in-sleep/
├── skills/
│   ├── claude-code/          # Claude Code SKILL.md files
│   ├── skills-codex/         # Codex CLI native skills
│   ├── idea-discovery/
│   ├── experiment-bridge/
│   ├── paper-writing/
│   ├── auto-review/
│   ├── rebuttal/             SKILL.md  ← each is a single readable file
│   ├── paper-slides/
│   ├── paper-poster/
│   ├── research-refine/
│   ├── formula-derivation/
│   └── ...
├── mcp-servers/
│   └── llm-chat/             # Universal reviewer bridge
├── templates/                # Input templates for every workflow
├── docs/
│   ├── CURSOR_ADAPTATION.md
│   ├── TRAE_ARIS_RUNBOOK_EN.md
│   ├── ANTIGRAVITY_ADAPTATION.md
│   ├── MODELSCOPE_GUIDE.md
│   ├── MiniMax-GLM-Configuration.md
│   └── CODEX_GEMINI_REVIEW_GUIDE.md
└── README.md

Troubleshooting

Cross-model review not triggering:

  • Check MCP server is running: codex mcp or node mcp-servers/llm-chat/index.js
  • Verify OPENAI_API_KEY or LLM_API_KEY is set
  • Check Claude Code MCP config in ~/.claude/settings.json

W&B metrics not loading:

import wandb
# Ensure you're logged in
wandb.login(key=os.environ["WANDB_API_KEY"])
api = wandb.Api()
# Use full entity/project path
runs = api.runs("your-entity/your-project")

Context window exceeded mid-workflow:

/research-pipeline "topic" — compact: true

Then resume with — resume: true on the next interrupted skill.

Citation hallucination warnings ([VERIFY] tags): These are intentional — ARIS flags unverified citations rather than silently hallucinating. Manually verify flagged citations before submission.

Rebuttal exceeds character limit: Increase max stress test rounds — each round trims the draft:

/rebuttal "paper/ + reviews" — character limit: 5000, max stress test rounds: 3

ModelScope free tier rate limits: Add delay between skill calls or switch to a paid endpoint for overnight runs.


Why Two Models (Not One, Not Four)

  • 1 model self-reviewing → local minima, blind spots (stochastic bandit)
  • 2 models cross-reviewing → adversarial critique breaks blind spots (adversarial bandit)
  • 4+ models → diminishing returns, 2-4× API cost, coordination overhead

Claude Code = fast fluid execution. GPT-5.4/Gemini/GLM = slower, more deliberate critique. Speed × Rigor = better outcomes than either model alone.


Community & Citation

@software{aris2026,
  title  = {ARIS: Auto-Research-In-Sleep},
  author = {wanshuiyin},
  year   = {2026},
  url    = {https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep}
}

Join the community: GitHub Discussions

Papers accepted using ARIS: CS Conference (8/10 "clear accept"), AAAI 2026 Main Technical (7/10 "good paper, accept").

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

everything-claude-code-harness

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

paperclip-ai-orchestration

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

freecodecamp-curriculum

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

opencli-web-automation

No summary provided by upstream source.

Repository SourceNeeds Review