autoresearch-pro

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Safety Notice

This item is sourced from the public archived skills repository. Treat as untrusted until reviewed.

Copy this and send it to your AI assistant to learn

Install skill "autoresearch-pro" with this command: npx skills add 0xcjl/autoresearch-pro

autoresearch-pro

Overview

Automatically improve any OpenClaw skill, prompt, or article through iterative mutation-testing: small edits → run test cases → score with checklist → keep improvements, discard regressions.

Inspired by Karpathy/autoresearch.

Supports three optimization modes:

ModeInputOutput
SkillPath to a skill directoryImproved SKILL.md
PromptA prompt text stringImproved prompt
ArticleAn article/document textImproved article

Workflow

Step 1 — Identify Mode and Input

Ask the user to confirm:

  • Mode 1 — Skill: User says "optimize [skill-name]" or provides a skill path
  • Mode 2 — Prompt: User says "optimize this prompt" or pastes a prompt
  • Mode 3 — Article: User says "improve this article" or pastes article text

For Skill mode, resolve the skill path to ~/.openclaw/skills/<skill-name>/SKILL.md. For Prompt/Article mode, keep the text in context (do not write to disk unless needed).

Step 2 — Generate Checklist (10 Questions)

Read the target content first. Then generate 10 diverse, specific yes/no checklist questions relevant to the content type:

For Skill mode (same as before):

#DimensionWhat to Check
1Description clarityIs the frontmatter description precise and actionable?
2Trigger coverageDoes it cover the main real-world use cases?
3Workflow structureAre steps clearly sequenced and unambiguous?
4Error guidanceDoes it handle error states and edge cases?
5Tool usage accuracyAre tool names and parameters correct for OpenClaw?
6Example qualityDo examples reflect real usage patterns?
7ConcisenessIs content free of redundant repetition?
8Freedom calibrationIs instruction specificity appropriate?
9Reference qualityAre references and links accurate?
10CompletenessAre all sections filled with real content?

For Prompt mode (10 tailored questions):

#DimensionWhat to Check
1Goal clarityDoes the prompt state a clear, specific goal?
2Role/toneIs the desired role or tone specified?
3Input formatIs the input format clearly described?
4Output formatIs the expected output format specified?
5ConstraintsAre key constraints and boundaries stated?
6Context sufficiencyIs enough context provided to avoid hallucination?
7Edge casesDoes it handle ambiguous or edge case inputs?
8ConcisenessIs it free of redundant or contradictory instructions?
9ActionabilityAre instructions concrete and actionable vs. vague?
10CompletenessAre all necessary elements for the task present?

For Article mode (10 tailored questions):

#DimensionWhat to Check
1Title qualityDoes the title clearly convey the main value?
2Opening hookDoes the opening grab attention and set expectations?
3Logical structureAre ideas logically organized (not random)?
4Argument clarityAre claims supported with evidence or reasoning?
5ConcisenessIs unnecessary padding or repetition removed?
6Transition flowDo paragraphs/sections flow smoothly?
7Closing strengthDoes the conclusion summarize and inspire action?
8Tone consistencyIs the tone consistent throughout?
9ReadabilityIs sentence/paragraph length varied appropriately?
10Audience matchDoes language match the target audience level?

Present the 10 questions, numbered 1-10. Ask the user to select which ones to activate (e.g., "use questions 1, 3, 5, 7"). Default: use all 10 if user doesn't specify.

Step 3 — Prepare Test Cases

  • Skill mode: Generate 3-5 realistic prompts a user would send when using the skill
  • Prompt mode: Generate 3-5 test inputs that the prompt would process
  • Article mode: Generate 3-5 ways the article might be read or consumed

Store test cases in context — do not write to disk.

Step 4 — Run Autoresearch Loop

Loop configuration:

  • Rounds per batch: 30
  • Max total rounds: 100
  • Pause: After every 30 rounds, show summary and ask user to continue or stop
  • Stop conditions: User says stop, OR 100 rounds completed

Per-round procedure:

  1. Mutate: Make ONE small edit to the target content:

    • Skill mode: edit SKILL.md
    • Prompt mode: edit the prompt string
    • Article mode: edit the article text
  2. Test: For each test case, simulate what output the content would produce.

  3. Score: Apply each active checklist question (0 or 1 per question). Score = (passed / total) × 100.

  4. Decide: If new score ≥ best score → keep the mutation. If lower → revert.

  5. Log: Round number, mutation type, score, keep/revert decision.

Mutation types (pick one per round):

TypeDescription
AAdd a constraint rule
BStrengthen trigger/coverage
CAdd a concrete example
DTighten vague language
EImprove error/edge case handling
FRemove redundant content
GImprove transitions
HExpand a thin section
IAdd cross-reference
JAdjust degree-of-freedom

Step 5 — Report Results

After each batch (30 rounds):

Batch N (rounds X-Y):
  Best score: XX%
  Mutations kept: N  |  Reverted: N
  Most effective types: [list top 2-3]
Accumulated improvements: [summary]
Continue? (yes/stop)

After full completion:

  • Original score vs. final score
  • Top 3 most impactful mutations
  • Final improved content (inline or diff)
  • File path (skill mode only)

Mutation Strategy Reference

High-impact, low-risk changes:

  • Adding explicit constraints where the content is vague
  • Expanding coverage to cover edge cases
  • Adding concrete examples to abstract instructions
  • Tightening soft language ("try to" → "must")

Avoid in one round:

  • Large rewrites of entire sections
  • Multiple unrelated changes at once
  • Changing fundamental scope or purpose

See references/mutation_strategies.md for the full strategy guide.


Mode Selection Quick Reference

User saysMode
"optimize [skill]" / "autoresearch [skill]"Skill
"optimize this prompt" / "improve my prompt"Prompt
"polish this article" / "improve this article"Article
"optimize this document"Article

Default to Prompt mode if the input is a text string without a skill path.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

designer-intelligence-station

为设计师提供 AI、智能硬件、手机、设计领域的情报收集与筛选服务。支持多源监测、5 维筛选标准、定时推送。

Archived SourceRecently Updated
Research

modern-chanakya

Interpret Chanakya, Chanakya Niti, Arthashastra, and reliable historical/wiki-style summaries into modern practical guidance for systems, software, product building, career strategy, discipline, leadership, governance, and execution. Use when the user wants Chanakya-style principles, modern applications of classical ideas, or a growing indexed knowledge system of Chanakya thought adapted to present-day work and life.

Archived SourceRecently Updated
Research

claw2ui

Generate interactive web pages (dashboards, charts, tables, reports) and serve them via public URL. Use this skill when the user explicitly asks for data visualization, dashboards, analytics reports, comparison tables, status pages, or web-based content. Also triggers for: "draw me a chart", "make a dashboard", "show me a table", "generate a report", "visualize this data", "render this as a page", "publish a page", "claw2ui". If the response would benefit from charts, sortable tables, or rich layout, **suggest** using Claw2UI and wait for user confirmation before publishing. Chinese triggers: "做个仪表盘", "画个图表", "做个报表", "生成一个页面", "做个dashboard", "数据可视化", "做个网页", "展示数据", "做个表格", "做个图", "发布一个页面", "做个看板". Additional English triggers: "create a webpage", "show analytics", "build a status page", "make a chart", "data overview", "show me stats", "create a board", "render a page", "comparison chart", "trend analysis", "pie chart", "bar chart", "line chart", "KPI dashboard", "metrics overview", "weekly report", "monthly report".

Archived SourceRecently Updated
Research

X/Twitter Research

# X/Twitter Research Skill

Archived SourceRecently Updated