ln-814-optimization-executor

Multi-file hypothesis testing with keep/discard loop, compound baselines, and experiment logging

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ln-814-optimization-executor" with this command: npx skills add levnikolaevich/claude-code-skills/levnikolaevich-claude-code-skills-ln-814-optimization-executor

Paths: File paths (shared/, references/, ../ln-*) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

ln-814-optimization-executor

Type: L3 Worker Category: 8XX Optimization

Executes optimization hypotheses from the researcher using keep/discard autoresearch loop. Supports multi-file changes, compound baselines, and any optimization type (algorithm, architecture, query, caching, batching).


Overview

AspectDetails
Input.optimization/{slug}/context.md OR conversation context (standalone invocation)
OutputOptimized code on isolated branch, per-hypothesis results, experiment log
PatternStrike-first: apply all → test → measure. Bisect only on failure. A/B only for contested alternatives

Workflow

Phases: Pre-flight → Baseline → Strike-First Execution → Report → Gap Analysis


Phase 0: Pre-flight Checks

Slug Resolution

  • If invoked via Agent with contextStore containing slug — use directly.
  • If invoked standalone — derive slug from context_file path or ask user.

Step 1: Load Context

Read .optimization/{slug}/context.md from project root. Contains problem statement, profiling results, research hypotheses, and target metric.

If file not found: check conversation context for the same data (standalone invocation).

Step 2: Pre-flight Validation

CheckRequiredAction if Missing
Hypotheses provided (H1..H7)YesBlock — nothing to execute
Test infrastructureYesBlock (see ci_tool_detection.md)
Git clean stateYesBlock (need clean baseline for revert)
Worktree isolationYesCreate per git_worktree_fallback.md
E2E safety testNo (recommended)Read from context; WARN if null — full test suite as fallback gate

MANDATORY READ: Load shared/references/git_worktree_fallback.md — use optimization rows. MANDATORY READ: Load shared/references/ci_tool_detection.md — use Test Frameworks + Benchmarks sections.

E2E Safety Test

Read e2e_test_command from context file (discovered by profiler during test discovery phase).

SourceAction
Context has e2e_test_commandUse as functional safety gate in Phase 2
Context has e2e_test_command = nullWARN: full test suite serves as fallback gate
Standalone (no context)User must provide test command; block if missing

Phase 1: Establish Baseline

Reuse baseline from performance map (already measured with real metrics).

From Context File

Read performance_map.baseline and performance_map.test_command from .optimization/{slug}/context.md.

FieldSource
test_commandDiscovered/created test command
baselineMulti-metric snapshot: wall time, CPU, memory, I/O

Verification Run

Run test_command once to confirm baseline is still valid (code unchanged since profiling):

StepAction
1Run test_command
2IF result within 10% of baseline.wall_time_ms → baseline confirmed
3IF result diverges > 10% → re-measure (3 runs, median) as new baseline
4IF test FAILS → BLOCK: "test fails on unmodified code"

Phase 2: Strike-First Execution

MANDATORY READ: Load optimization_categories.md for pattern reference during implementation.

Apply maximum changes at once. Only fall back to A/B testing where sources genuinely disagree on approach.

Step 1: Triage Hypotheses

Split hypotheses from researcher into two groups:

GroupCriteriaAction
UncontestedClear best approach, no conflicting alternativesApply directly in the strike
ContestedMultiple approaches exist (e.g., source A says cache, source B says batch) OR conflicts_with another hypothesisA/B test each alternative on top of full implementation

Most hypotheses should be uncontested — the researcher already ranked them by evidence.

Step 2: Strike (Apply All Uncontested)

1. APPLY all uncontested hypotheses at once (all file edits)
2. VERIFY: Run full test suite
   IF tests FAIL:
     - IF fixable (typo, missing import) → fix & re-run ONCE
     - IF fundamental → BISECT (see Step 4)
3. E2E GATE (if e2e_test_command not null):
   IF FAIL → BISECT
4. MEASURE: 5 runs, median
5. COMPARE: improvement vs baseline
   IF improvement meets target → DONE. Commit all:
     git add {all_files}
     git commit -m "perf: apply optimizations H1,H2,H3,... (+{improvement}%)"
   IF no improvement → BISECT

Step 3: Contested Alternatives (A/B on top of strike)

For each contested pair/group, with ALL uncontested changes already applied:

FOR each contested hypothesis group:
  1. Apply alternative A → test → measure (5 runs, median)
  2. Revert alternative A, apply alternative B → test → measure
  3. KEEP the winner. Commit.
  4. Winner becomes part of the baseline for next contested group.

Step 4: Bisect (only on strike failure)

If strike fails tests or shows no improvement:

1. Revert all changes: git checkout -- . && git clean -fd
2. Binary search: apply first half of hypotheses → test
   - IF passes → problem in second half
   - IF fails → problem in first half
3. Narrow down to the breaking hypothesis
4. Remove it from strike, re-apply remaining → test → measure
5. Log removed hypothesis with reason

Scope Rules

RuleDescription
File scopeMultiple files allowed (not limited to single function)
Signature changesAllowed if tests still pass
New filesAllowed (cache wrapper, batch adapter, utility)
New dependenciesAllowed if already in project ecosystem (e.g., using configured Redis)
Time budget45 minutes total

Revert Protocol

ScopeCommand
Full revertgit checkout -- . && git clean -fd (safe in worktree)
Single hypothesisgit checkout -- {files} (only during bisect)

Safety Rules

RuleDescription
TraceabilityCommit message lists all applied hypothesis IDs
IsolationAll work in isolated worktree; never modify main worktree
Bisect only on failureDo NOT test hypotheses individually unless strike fails or alternatives genuinely conflict
Crash triageRuntime crash → fix once if trivial (typo, import), else bisect to find cause

Phase 3: Report Results

Report Schema

FieldDescription
baselineOriginal measurement (metric + value)
finalFinal measurement after optimizations
total_improvement_pctOverall percentage improvement
target_metBoolean — did we reach the target metric?
strike_resultclean (all applied) / bisected (some removed) / failed
hypotheses_appliedList of hypothesis IDs applied in strike
hypotheses_removedList removed during bisect (with reasons)
contested_resultsPer-contested group: alternatives tested, winner, measurement
branchWorktree branch name
files_modifiedAll changed files
e2e_test{ command, source, baseline_passed, final_passed } or null

Results Comparison (mandatory)

Show baseline vs final for EVERY metric from performance_map.baseline. Include both percentage and multiplier.

| Metric | Baseline | After Strike | Improvement |
|--------|----------|-------------|-------------|
| Wall time | 7280ms | 3800ms | 47.8% (1.9x) |
| CPU time | 850ms | 720ms | 15.3% (1.2x) |
| Memory peak | 256MB | 245MB | 4.3% |
| HTTP round-trips | 13 | 2 | 84.6% (6.5x) |

Target: 5000ms → Achieved: 3800ms ✓ TARGET MET

Per-Function Delta (if instrumentation available)

If instrumented_files from context is non-empty, run test_command once more AFTER strike to capture per-function timing with the same instrumentation the profiler placed:

| Function | Before (ms) | After (ms) | Delta |
|----------|------------|------------|-------|
| mt_translate | 3500 | 450 | -87% (7.8x) |
| tikal_extract | 2800 | 2800 | 0% (unchanged) |

Then clean up: git checkout -- {instrumented_files} — remove all profiling instrumentation before final commit.

Present both tables to user. This is the primary deliverable — numbers the user sees first.

Experiment Log

Write to {project_root}/.optimization/{slug}/ln-814-log.tsv:

ColumnDescription
timestampISO 8601
phasestrike / bisect / contested
hypothesesComma-separated IDs applied in this round
baseline_msBaseline before this round
result_msMeasurement after changes
improvement_pctPercentage change
statusapplied / removed / alternative_a / alternative_b
commitGit commit hash
filesComma-separated modified files
e2e_statuspass / fail / skipped

Append to existing file if present (enables tracking across multiple runs).


Phase 4: Gap Analysis (If Target Not Met)

If target metric not reached after all hypotheses:

SectionContent
AchievementWhat was achieved (original → final, improvement %)
Remaining bottlenecksFrom time map: which steps still dominate
Remaining cyclesIf coordinator runs multi-cycle: "{remaining} optimization cycles available for remaining bottlenecks"
Infrastructure recommendationsIf bottleneck requires infra changes (scaling, caching layer, CDN)
Further researchOptimization directions not explored in this run

Error Handling

ErrorRecovery
Strike fails all testsBisect to find breaking hypothesis, remove it, retry
Strike shows no improvementBisect to identify ineffective hypotheses
Measurement inconsistent (high variance)Increase runs to 10, use median
Worktree creation failsFall back to branch per git_worktree_fallback.md
Time budget exceededStop loop, report partial results with hypotheses remaining
Multi-file revert failsgit checkout -- . in worktree (safe — worktree is isolated)

References

  • optimization_categories.md — optimization pattern checklist
  • shared/references/ci_tool_detection.md (test + benchmark detection)
  • shared/references/git_worktree_fallback.md (worktree isolation)

Definition of Done

  • Baseline established using same metric type as observed problem
  • Hypotheses triaged: uncontested vs contested
  • Strike applied: all uncontested hypotheses implemented at once
  • Tests pass after strike
  • Contested alternatives A/B tested on top of full implementation
  • Bisect performed only if strike fails (not preemptively)
  • E2E safety test passes (or documented as unavailable)
  • Experiment log written to .optimization/{slug}/ln-814-log.tsv
  • Report returned with baseline, final, improvement%, strike result
  • All changes on isolated branch, pushed to remote
  • Gap analysis provided if target metric not met

Version: 2.0.0 Last Updated: 2026-03-14

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

ln-782-test-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ln-140-test-docs-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ln-150-presentation-creator

No summary provided by upstream source.

Repository SourceNeeds Review