OpenClaw / ClawLite Self-Improvement
Use this skill to turn mistakes, corrections, blockers, and better approaches into durable operating knowledge.
What problem this solves
AI ops often repeat the same failures because mistakes stay in chat history instead of becoming system rules. This skill creates a lightweight improvement loop:
- log failures and learnings
- separate errors from feature requests
- run small eval-driven experiments on repeated failures
- classify harness/runtime failures instead of blaming vague “model issues”
- generate daily agent scorecards from real evidence chains
- promote important patterns into AGENTS.md / TOOLS.md / SOUL.md
- write operator notes into Obsidian vault
- support stricter acceptance via Karen / Mission Control
When to use
Use this skill when the user asks:
- "make the agent improve itself"
- "capture learnings"
- "log mistakes so we do not repeat them"
- "record blockers / corrections / feature gaps"
- "build a self-improving OpenClaw workflow"
- "operationalize lessons learned"
- "test whether this new rule actually helps"
- "run an eval loop on this workflow/skill/SOP"
- "should we keep this new guardrail or discard it"
- "why did the agents fail today"
- "why is daily marketing not closing automatically"
- "classify OpenClaw harness failures"
- "generate agent delivery scorecard"
Files this skill uses
.learnings/LEARNINGS.md.learnings/ERRORS.md.learnings/FEATURE_REQUESTS.md.learnings/EXPERIMENTS.mdmemory/harness-backlog-latest.mdmission-control/data/delivery-receipts/agent-scorecard-YYYY-MM-DD.md- Optional export under
.learnings/exports/obsidian/by default, orOBSIDIAN_LEARNINGS_DIRif explicitly configured
Safety boundaries
- Local-file workflow only, no network I/O
- Promotion can append to
AGENTS.md,TOOLS.md, orSOUL.md - Always review promotion targets first, or run
scripts/promote-learning.mjs ... --dry-run OBSIDIAN_LEARNINGS_DIRshould only point at a path you intend to modify
Command examples
node {baseDir}/scripts/log-learning.mjs learning "Summary" "Details" "Suggested action"
node {baseDir}/scripts/log-learning.mjs error "Summary" "Error details" "Suggested fix"
node {baseDir}/scripts/log-learning.mjs feature "Capability name" "User context" "Suggested implementation"
node {baseDir}/scripts/log-learning.mjs experiment "Target problem" "Baseline failure" "Single mutation to test"
node {baseDir}/scripts/log-experiment.mjs "Target problem" "Baseline failure" "Single mutation" "eval1|eval2|eval3" "Result summary" "testing"
node {baseDir}/scripts/promote-learning.mjs workflow "Rule text"
node {baseDir}/scripts/analyze-openclaw-failures.mjs --output /Users/m1/.openclaw/workspace/memory/harness-backlog-latest.md
node {baseDir}/scripts/daily-agent-scorecard.mjs --output /Users/m1/.openclaw/workspace/mission-control/data/delivery-receipts/agent-scorecard-$(date +%F).md
node {baseDir}/scripts/daily-agent-scorecard.mjs --repair --output /Users/m1/.openclaw/workspace/mission-control/data/delivery-receipts/agent-scorecard-$(date +%F).md
Categories
learning
Use for:
- user corrections
- better recurring workflows
- tool gotchas
- operational lessons
error
Use for:
- command failures
- integration failures
- runtime blockers
- broken release / deploy behavior
feature
Use for:
- missing capability requests
- operator workflow gaps
- recurring requests that deserve a build item
experiment
Use for:
- repeated failures that need a tested guardrail
- checklist/SOP/schema changes that should be validated before broad promotion
- keep/discard decisions on new operating rules
- binary eval loops for skills, workflows, receipts, summaries, or deploy closeout rules
harness
Use for:
- gateway, channel, provider, tool, session, or platform failures
- repeated "agent did not respond / did not finish / forgot identity" incidents
- daily workflow failures where Mission Control says one thing but proof chains say another
- scorecards that compare agent delivery against real receipts, URLs, and closeout evidence
Default failure taxonomy:
NetworkPolicyBlocked- provider/tool blocked by local or external network policyGatewayUnavailable- gateway process, port, websocket, or reachability failureSessionContextRot- stale session, stale skill snapshot, identity drift, or outdated config contextSkillMissing- expected skill absent from installed path or session snapshotToolInvalidArguments- malformed tool/edit call or bad argument shapeProviderError- provider/model/API failure not caused by network policyExternalPlatformBlocked- X/LinkedIn/Facebook/Feishu/etc. platform/API/login/visibility blockerHumanApprovalRequired- real approval boundary for external, destructive, production, money, or ambiguous action
Harness workflow:
- Scan logs and receipts with
scripts/analyze-openclaw-failures.mjs. - Generate same-day agent scorecard with
scripts/daily-agent-scorecard.mjs. - Run
scripts/daily-agent-scorecard.mjs --repairto create/update recovery tickets for failed, blocked, or pending lanes. - Convert repeated classes into an
error,experiment, or promoted rule. - Do not call a workflow closed until the scorecard has proof links or explicit blocker evidence.
Repair loop rules:
- Every failed/blocked/pending lane should have a
failureClass,repairState,nextAction,repeatCount7d, and evidence. ProofMissing,UpstreamMissing, andHumanApprovalRequiredmust not be blindly retried.- Repeated
agent + lane + failureClassfailures within 7 days should becomeEXPERIMENT_REQUIRED. - Recovery tickets should be written under
mission-control/data/recovery-tickets-v3/YYYY-MM-DD/.
Promotion targets
AGENTS.md→ workflow / delegation / execution rulesTOOLS.md→ tool gotchas, secrets locations, environment routing rulesSOUL.md→ behavior / communication / non-negotiable principles- Obsidian vault → reusable operator log and content proof asset
Karen / Mission Control compatibility
This skill is designed to work with stricter ops governance:
- Karen can reference learnings when repeated failures happen
- Mission Control can treat promoted learnings as new operating rules
- recurring blockers can be elevated from chat into tracked operational knowledge
- experiments can test whether a new summary contract, receipt rule, or deploy closeout guardrail actually reduced the failure pattern
Eval loop rule
When a repeated failure is turning into a new rule/SOP/checklist, do not only log it. Also:
- define 3-5 binary evals
- record the baseline failure state
- change one thing at a time
- re-check the same evals
- classify the change as keep / discard / partial_keep
Use {baseDir}/references/eval-loop.md for the experiment format and examples.
Output goal
A good use of this skill should produce one of:
- a durable learning entry
- a durable error entry
- a durable feature request entry
- a durable experiment entry with binary evals
- a promoted rule in AGENTS.md / TOOLS.md / SOUL.md
- an Obsidian vault operations note
Important limits
- Logging is not the same as fixing.
- Do not treat a learning entry as closure for a broken deliverable.
- Use this skill to reduce repeated mistakes, not to excuse them.
References
-
{baseDir}/references/schema.md -
{baseDir}/references/promotion-guide.md -
{baseDir}/references/eval-loop.md -
{baseDir}/references/examples.md -
{baseDir}/references/decision-rules.md -
{baseDir}/references/eval-loop.md -
{baseDir}/references/examples.md