Autoresearch Skill

Run autonomous iteration loops: Goal → Metric → Loop (make change → verify → keep/revert → repeat).

Core Protocol

SETUP:
1. Define GOAL (what to improve)
2. Define METRIC (how to measure success)
3. Define SCOPE (what can be modified)
4. Establish BASELINE (current metric value)

LOOP (forever or N iterations):
1. Review current state + history + results log
2. Pick next change (based on what worked, what failed, what's untried)
3. Make ONE focused change
4. Commit change (for rollback)
5. Run mechanical verification (tests, benchmarks, scores)
6. If improved → keep. If worse → revert. If error → fix or skip.
7. Log the result
8. Repeat until goal reached or max iterations

Principles

One change per iteration — Atomic changes. If it breaks, you know why.
Mechanical verification only — No subjective "looks good." Use metrics.
Automatic rollback — Failed changes revert instantly.
Git is memory — Each experiment is committed. Git revert preserves history.
Simplicity wins — Equal results + less code = KEEP

Quick Start

Goal: Improve memory search Top-1 hit rate from 65% to 75%
Metric: Benchmark score (openclaw cron runs --id <job-id> --limit 1)
Scope: ~/.openclaw/workspace/MEMORY.md, ~/.openclaw/openclaw.json
Max Iterations: 5

Then run the loop manually or spawn a subagent to execute it.

Usage Patterns

Pattern 1: Manual Loop (Interactive)

For simple tasks, run the loop yourself:

Iteration 1:
  - Change: [describe what you'll change]
  - Verify: [run verification]
  - Result: [keep/revert + reason]
  - Log entry

Pattern 2: Spawn Subagent (Autonomous)

For longer tasks, spawn a subagent with the loop instructions:

sessions_spawn with:
  - task: Full autoresearch loop specification
  - timeoutSeconds: 600 (10 min per iteration)
  - mode: run (one-shot) or session (persistent)

Pattern 3: Background Process

For very long loops, use exec with background continuation:

exec with:
  - command: The optimization script
  - background: true
  - yieldMs: 60000 (check every minute)

Verification Commands

Domain	Verify Command
Memory search	`openclaw cron runs --id <job-id> --limit 1`
Tests	`npm test`, `pytest`, `cargo test`
Build	`npm run build`, `cargo build`
Lint	`eslint .`, `ruff check .`
Benchmarks	`npm run bench`, custom benchmark script
Coverage	`npm test -- --coverage`

Logging Format

Track iterations in TSV format:

iteration	change	metric_before	metric_after	delta	status	description
0	baseline	65.0	65.0	0.0	baseline	initial state
1	lowered minScore	65.0	70.0	+5.0	keep	improved retrieval
2	tried larger model	70.0	68.0	-2.0	revert	worse, reverted
3	added corpus entry	70.0	72.0	+2.0	keep	filled gap

Subagent Template

When spawning a subagent for autoresearch, use this template:

GOAL: [what to improve]
METRIC: [how to measure]
VERIFICATION: [command to run]
SCOPE: [files that can be modified]
MAX_ITERATIONS: [number]

CONSTRAINTS:
- [resource limits]
- [safety rules]
- [reversibility requirements]

APPROACH:
1. Establish baseline
2. For each iteration:
   a. Identify next change
   b. Make ONE atomic change
   c. Run verification
   d. Compare to baseline
   e. Keep if improved, revert if worse
   f. Log result
3. Report final results

Common Patterns

Improving Benchmark Scores

Goal: Improve benchmark score
Metric: Benchmark output
Changes: Config tweaks, corpus improvements, model changes
Iterations: 5-10

Fixing Tests

Goal: All tests passing
Metric: Test count failing
Changes: Fix one test at a time
Iterations: Until zero failures

Reducing Bundle Size

Goal: Bundle < 100KB
Metric: Build output size
Changes: Remove dependencies, tree-shake, minify
Iterations: Until target met

Increasing Coverage

Goal: Coverage > 80%
Metric: Coverage percentage
Changes: Add tests for uncovered lines
Iterations: Until target met

Failure Handling

Failure	Response
Syntax error	Fix immediately, don't count as iteration
Runtime error	Attempt fix (max 3 tries), then move on
Resource exhaustion	Revert, try smaller variant
Timeout	Revert, simplify approach
External dependency failed	Skip, log, try different approach

Stopping Conditions

Goal metric reached
Max iterations hit
No improvement for 3 consecutive iterations
User interrupt (Ctrl+C or /stop)

References

For advanced patterns, see:

references/workflows.md — Multi-step workflows
references/metrics.md — Common metric patterns

autoresearch

Safety Notice

Copy this and send it to your AI assistant to learn