Paths: File paths (shared/, references/, ../ln-*) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

ln-814-optimization-executor

Type: L3 Worker Category: 8XX Optimization

Executes optimization hypotheses from the researcher using keep/discard autoresearch loop. Supports multi-file changes, compound baselines, and any optimization type (algorithm, architecture, query, caching, batching).

Overview

Aspect	Details
Input	`.optimization/{slug}/context.md` OR conversation context (standalone invocation)
Output	Optimized code on isolated branch, per-hypothesis results, experiment log
Pattern	Strike-first: apply all → test → measure. Bisect only on failure. A/B only for contested alternatives

Workflow

Phases: Pre-flight → Baseline → Strike-First Execution → Report → Gap Analysis

Phase 0: Pre-flight Checks

Slug Resolution

If invoked via Agent with contextStore containing slug — use directly.
If invoked standalone — derive slug from context_file path or ask user.

Step 1: Load Context

Read .optimization/{slug}/context.md from project root. Contains problem statement, profiling results, research hypotheses, and target metric.

If file not found: check conversation context for the same data (standalone invocation).

Step 2: Pre-flight Validation

Check	Required	Action if Missing
Hypotheses provided (H1..H7)	Yes	Block — nothing to execute
Test infrastructure	Yes	Block (see ci_tool_detection.md)
Git clean state	Yes	Block (need clean baseline for revert)
Worktree isolation	Yes	Create per git_worktree_fallback.md
E2E safety test	No (recommended)	Read from context; WARN if null — full test suite as fallback gate

MANDATORY READ: Load shared/references/git_worktree_fallback.md — use optimization rows. MANDATORY READ: Load shared/references/ci_tool_detection.md — use Test Frameworks + Benchmarks sections.

E2E Safety Test

Read e2e_test_command from context file (discovered by profiler during test discovery phase).

Source	Action
Context has `e2e_test_command`	Use as functional safety gate in Phase 2
Context has `e2e_test_command = null`	WARN: full test suite serves as fallback gate
Standalone (no context)	User must provide test command; block if missing

Phase 1: Establish Baseline

Reuse baseline from performance map (already measured with real metrics).

From Context File

Read performance_map.baseline and performance_map.test_command from .optimization/{slug}/context.md.

Field	Source
`test_command`	Discovered/created test command
`baseline`	Multi-metric snapshot: wall time, CPU, memory, I/O

Verification Run

Run test_command once to confirm baseline is still valid (code unchanged since profiling):

Step	Action
1	Run `test_command`
2	IF result within 10% of `baseline.wall_time_ms` → baseline confirmed
3	IF result diverges > 10% → re-measure (3 runs, median) as new baseline
4	IF test FAILS → BLOCK: "test fails on unmodified code"

Phase 2: Strike-First Execution

MANDATORY READ: Load optimization_categories.md for pattern reference during implementation.

Apply maximum changes at once. Only fall back to A/B testing where sources genuinely disagree on approach.

Step 1: Triage Hypotheses

Split hypotheses from researcher into two groups:

Group	Criteria	Action
Uncontested	Clear best approach, no conflicting alternatives	Apply directly in the strike
Contested	Multiple approaches exist (e.g., source A says cache, source B says batch) OR `conflicts_with` another hypothesis	A/B test each alternative on top of full implementation

Most hypotheses should be uncontested — the researcher already ranked them by evidence.

Step 2: Strike (Apply All Uncontested)

1. APPLY all uncontested hypotheses at once (all file edits)
2. VERIFY: Run full test suite
   IF tests FAIL:
     - IF fixable (typo, missing import) → fix & re-run ONCE
     - IF fundamental → BISECT (see Step 4)
3. E2E GATE (if e2e_test_command not null):
   IF FAIL → BISECT
4. MEASURE: 5 runs, median
5. COMPARE: improvement vs baseline
   IF improvement meets target → DONE. Commit all:
     git add {all_files}
     git commit -m "perf: apply optimizations H1,H2,H3,... (+{improvement}%)"
   IF no improvement → BISECT

Step 3: Contested Alternatives (A/B on top of strike)

For each contested pair/group, with ALL uncontested changes already applied:

FOR each contested hypothesis group:
  1. Apply alternative A → test → measure (5 runs, median)
  2. Revert alternative A, apply alternative B → test → measure
  3. KEEP the winner. Commit.
  4. Winner becomes part of the baseline for next contested group.

Step 4: Bisect (only on strike failure)

If strike fails tests or shows no improvement:

1. Revert all changes: git checkout -- . && git clean -fd
2. Binary search: apply first half of hypotheses → test
   - IF passes → problem in second half
   - IF fails → problem in first half
3. Narrow down to the breaking hypothesis
4. Remove it from strike, re-apply remaining → test → measure
5. Log removed hypothesis with reason

Scope Rules

Rule	Description
File scope	Multiple files allowed (not limited to single function)
Signature changes	Allowed if tests still pass
New files	Allowed (cache wrapper, batch adapter, utility)
New dependencies	Allowed if already in project ecosystem (e.g., using configured Redis)
Time budget	45 minutes total

Revert Protocol

Scope	Command
Full revert	`git checkout -- . && git clean -fd` (safe in worktree)
Single hypothesis	`git checkout -- {files}` (only during bisect)

Safety Rules

Rule	Description
Traceability	Commit message lists all applied hypothesis IDs
Isolation	All work in isolated worktree; never modify main worktree
Bisect only on failure	Do NOT test hypotheses individually unless strike fails or alternatives genuinely conflict
Crash triage	Runtime crash → fix once if trivial (typo, import), else bisect to find cause

Phase 3: Report Results

Report Schema

Field	Description
baseline	Original measurement (metric + value)
final	Final measurement after optimizations
total_improvement_pct	Overall percentage improvement
target_met	Boolean — did we reach the target metric?
strike_result	`clean` (all applied) / `bisected` (some removed) / `failed`
hypotheses_applied	List of hypothesis IDs applied in strike
hypotheses_removed	List removed during bisect (with reasons)
contested_results	Per-contested group: alternatives tested, winner, measurement
branch	Worktree branch name
files_modified	All changed files
e2e_test	`{ command, source, baseline_passed, final_passed }` or null

Results Comparison (mandatory)

Show baseline vs final for EVERY metric from performance_map.baseline. Include both percentage and multiplier.

| Metric | Baseline | After Strike | Improvement |
|--------|----------|-------------|-------------|
| Wall time | 7280ms | 3800ms | 47.8% (1.9x) |
| CPU time | 850ms | 720ms | 15.3% (1.2x) |
| Memory peak | 256MB | 245MB | 4.3% |
| HTTP round-trips | 13 | 2 | 84.6% (6.5x) |

Target: 5000ms → Achieved: 3800ms ✓ TARGET MET

Per-Function Delta (if instrumentation available)

If instrumented_files from context is non-empty, run test_command once more AFTER strike to capture per-function timing with the same instrumentation the profiler placed:

| Function | Before (ms) | After (ms) | Delta |
|----------|------------|------------|-------|
| mt_translate | 3500 | 450 | -87% (7.8x) |
| tikal_extract | 2800 | 2800 | 0% (unchanged) |

Then clean up: git checkout -- {instrumented_files} — remove all profiling instrumentation before final commit.

Present both tables to user. This is the primary deliverable — numbers the user sees first.

Experiment Log

Write to {project_root}/.optimization/{slug}/ln-814-log.tsv:

Column	Description
timestamp	ISO 8601
phase	`strike` / `bisect` / `contested`
hypotheses	Comma-separated IDs applied in this round
baseline_ms	Baseline before this round
result_ms	Measurement after changes
improvement_pct	Percentage change
status	`applied` / `removed` / `alternative_a` / `alternative_b`
commit	Git commit hash
files	Comma-separated modified files
e2e_status	pass / fail / skipped

Append to existing file if present (enables tracking across multiple runs).

Phase 4: Gap Analysis (If Target Not Met)

If target metric not reached after all hypotheses:

Section	Content
Achievement	What was achieved (original → final, improvement %)
Remaining bottlenecks	From time map: which steps still dominate
Remaining cycles	If coordinator runs multi-cycle: "{remaining} optimization cycles available for remaining bottlenecks"
Infrastructure recommendations	If bottleneck requires infra changes (scaling, caching layer, CDN)
Further research	Optimization directions not explored in this run

Error Handling

Error	Recovery
Strike fails all tests	Bisect to find breaking hypothesis, remove it, retry
Strike shows no improvement	Bisect to identify ineffective hypotheses
Measurement inconsistent (high variance)	Increase runs to 10, use median
Worktree creation fails	Fall back to branch per git_worktree_fallback.md
Time budget exceeded	Stop loop, report partial results with hypotheses remaining
Multi-file revert fails	`git checkout -- .` in worktree (safe — worktree is isolated)

References

optimization_categories.md — optimization pattern checklist
shared/references/ci_tool_detection.md (test + benchmark detection)
shared/references/git_worktree_fallback.md (worktree isolation)

Definition of Done

Version: 2.0.0 Last Updated: 2026-03-14