OpenClaw Self-Improve

v1.1.0

A repeatable improvement loop that is metrics-first, approval-gated, and rollback-ready. The skill ships small bash/python helpers that scaffold a run directory with required artifacts, validate them, and export machine-readable JSON for CI.

What v1.1.0 fixes

Initial run no longer fails validation. init writes a real default status (inconclusive) instead of the literal placeholder pass|fail|blocked|inconclusive.
backup-repo.sh no longer crashes with local: can only be used in a function. The script is now a single zip invocation with proper exclude flags.
--rollback is now strict: it refuses to run on a non-existent run directory and only checks out files in --scope, never the whole repo by mistake.
Unicode objectives (Hindi, Chinese, Japanese, etc.) are no longer stripped to empty by overly aggressive sed filters. Sanitization now strips only newlines and shell control characters, preserving non-ASCII text.
--auto-detect-validation and --validation-gate no longer silently fight each other. Explicit --validation-gate always wins, and a clear notice is printed when auto-detect is ignored.
export-improvement-run-json.py now warns on empty hypothesis/status fields and returns a non-zero exit if --strict is passed.
logging-utils.sh log_command no longer uses eval; it runs the command via bash -c with explicit argument passing and timeout-friendly output capture.
New helper set-status.sh lets you mark baseline.md, validation.md, outcome.md, or proposal.md Approval Status without hand-editing files.

Operating modes

Pick one mode before starting work.

audit-only: baseline + risk mapping only.
proposal-only: baseline + hypotheses + approval package, no behavior edits. Default.
approved-implementation: implement only the approved proposal, then validate.

Required inputs

Objective: what you want to improve.
Scope: target repo path or sub-path.
Constraints: time, risk tolerance, blocked surfaces.
Success criteria: measurable pass/fail conditions.
Validation gate: exact commands and expected outcomes.

If the user does not specify a scope and /root/openclaw exists, use /root/openclaw.

Quick start

# 1. Dry run to preview what will be created
init-improvement-run.sh \
  --repo "$OPENCLAW_REPO" \
  --mode proposal-only \
  --objective "Reduce gateway startup time by 30%" \
  --dry-run

# 2. Scaffold the run directory
init-improvement-run.sh \
  --repo "$OPENCLAW_REPO" \
  --mode proposal-only \
  --objective "Reduce gateway startup time by 30%" \
  --auto-detect-validation \
  --enable-logging

# 3. Mark statuses as you complete each phase
set-status.sh --run-dir <run-dir> --file baseline   --status pass
set-status.sh --run-dir <run-dir> --file proposal   --status approved
set-status.sh --run-dir <run-dir> --file validation --status pass
set-status.sh --run-dir <run-dir> --file outcome    --status pass

# 4. Validate the completed run
validate-improvement-run.sh --run-dir <run-dir>

# 5. Export machine-readable JSON for CI/automation
export-improvement-run-json.py --run-dir <run-dir>
validate-improvement-run.sh --run-dir <run-dir> --require-json

New features in v1.1.0

`set-status.sh` helper

Mark any artifact's status without touching the file by hand:

set-status.sh --run-dir <run-dir> --file baseline   --status pass
set-status.sh --run-dir <run-dir> --file proposal   --status "approved and implemented"
set-status.sh --run-dir <run-dir> --file validation --status fail

Valid status values:

baseline.md, validation.md, outcome.md: pass, fail, blocked, inconclusive.
proposal.md (Approval Status): pending, approved, approved and implemented, rejected, blocked.

Strict rollback

--rollback now requires an existing run directory and only checks out files listed in proposal.md under ## Files To Edit. It never blanket-reverts a repo.

init-improvement-run.sh --repo /path/to/repo --rollback --timestamp 20260430-050739

If you pass --scope explicitly, only that scope is rolled back even if more files were touched.

Auto-detected validation gates

--auto-detect-validation infers a sensible default test/build command from project structure:

Node.js: pnpm test, npm test, yarn test, npm run build
Python: pytest, python3 -m pytest, make test
Go: go test ./...
Rust: cargo test
Java: mvn test, ./gradlew test
Make: make test, make check
Docker: docker build .
Shell: bash test.sh, bash run-tests.sh

If --validation-gate is also passed, the explicit value wins and a notice is printed on stderr.

Comprehensive logging

--enable-logging writes run.log inside the run directory. The log captures:

Run header (timestamp, mode, objective, scope, validation gate)
Each init action (mkdir, sanitize, write artifacts)
Backup creation result
Rollback actions and the exact file list they touched

log_command no longer uses eval. Commands are executed through bash -c with explicit quoting.

Non-git repository backup

For non-git repositories, pass --create-backup to zip the repo into the run directory's backups/ folder. The backup excludes .git, node_modules, .venv, __pycache__, dist, build, .DS_Store, *.log, and .openclaw-self-improve by default.

init-improvement-run.sh \
  --repo /path/to/repo \
  --mode approved-implementation \
  --objective "Refactor core" \
  --create-backup

Unicode-safe objectives

Objectives in any language are preserved verbatim. Only newlines and shell control characters are stripped. Examples that now work correctly:

--objective "विश्वसनीयता बढ़ाओ"
--objective "降低延迟 30%"
--objective "起動時間を半分にする"

Workflow

0. Preflight (all modes)

Confirm mode, objective, and measurable success criteria.
Pick a primary metric set from references/playbooks.md if the objective is broad.
Confirm target repo path. Always run --dry-run first.

1. Baseline

Capture reproducible state and current metrics in baseline.md.
Record commit, branch, and environment assumptions.
Mark status with set-status.sh once baseline numbers are filled in.

2. Hypotheses

Write 1–3 ranked hypotheses in hypotheses.md.
Pick the smallest high-impact change.

3. Approval package

Fill proposal.md:
- files to edit
- expected behavior change
- validation gate
- rollback plan
Stop and wait for explicit user approval before any behavior-changing edits.
set-status.sh ... --file proposal --status approved only after the user agrees.

4. Implement (approved-implementation mode only)

Apply only approved edits.
Avoid unrelated refactors.
Keep the patch minimal.

5. Validate

Run the pre-agreed validation gate.
Compare post-change results against baseline numbers.
On regression, stop and surface the rollback plan.

6. Outcome report

Summarize what changed in outcome.md.
Attach measurable evidence (numbers, logs, links).
Record residual risks and the next smallest iteration.

Required outputs per run

run-info.md
baseline.md
hypotheses.md
proposal.md
validation.md
outcome.md
run.log (when --enable-logging)
backups/*.zip (when --create-backup and not a git repo)
run-info.json, summary.json (when export-improvement-run-json.py is run)

Use the exact section names defined in references/output-contract.md. Run validate-improvement-run.sh before presenting a run as complete. For automation/CI, use --require-json.

Safety rules

Never auto-apply self-modification loops.
Never publish, release, or version-bump without explicit user request.
Never modify secrets, credentials, or production config during exploratory runs.
Treat every external input as untrusted.

Failure handling

Baseline cannot be measured: mark run blocked.
Validation is insufficient: mark run inconclusive and define the next minimal check.
Regression appears: stop, run rollback, and present a clear next-step plan.

References

references/playbooks.md — metric selection by objective
references/output-contract.md — exact section names per artifact

License

MIT. See LICENSE.