Loop - Reflex Iteration Workflow

Run a result-focused iterative loop with strict phase contracts, evaluation rules, and persistent state between sessions.

Terminology:

loop = one full execution for a task alias (stored in run.json, identified by run_id)
iteration = one cycle inside that loop

Core Idea

Each iteration executes 6 phases with parallel execution where possible:

PLAN - short plan for current iteration
PRODUCE - produce one artifact.md ← runs in parallel with PREPARE
PREPARE - generate check scripts and test definitions from rules ← runs in parallel with PRODUCE
EVALUATE - run prepared checks + content rules against artifact, score result. Uses parallel Task agents for independent check groups
CRITIQUE - precise issues + fixes (only if fail)
REFINE - rewrite artifact using critique (only if fail)

         PLAN
           │
    ┌──────┴──────┐
    ↓             ↓          ← parallel (Task tool)
 PRODUCE      PREPARE
 (artifact)   (checks)
    ↓             ↓
    └──────┬──────┘
           ↓
       EVALUATE              ← parallel check execution (Task tool)
       ┌───┼───┐
       ↓   ↓   ↓
      exec content aggregate
       └───┼───┘
           ↓
       CRITIQUE (if fail)
           ↓
       REFINE (if fail)

Stop when quality is good enough, no major issues remain, or iteration limit is reached.

Persistence Contract

Use exactly 3+1 files for state (where current.json exists only while a loop is active):

.ai-factory/evolution/current.json
.ai-factory/evolution/<task-alias>/run.json
.ai-factory/evolution/<task-alias>/history.jsonl
.ai-factory/evolution/<task-alias>/artifact.md

Do not create extra index files or per-iteration folder trees unless user explicitly asks.

File Roles

current.json: pointer to active loop only; delete it when loop becomes completed/stopped/failed
run.json: single source of truth for current loop state
history.jsonl: append-only event log (one JSON object per line)
artifact.md: single source of truth for artifact content (written after PRODUCE and REFINE phases, never duplicated in run.json)

Command Modes

Parse $ARGUMENTS:

status - show active loop status from current.json and stop
resume [alias] - continue active loop or loop by alias
stop [reason] - stop active loop with reason (user_stop if omitted)
new <task> or no mode + task text - start new loop
list - list all task aliases with status (running/stopped/completed/failed)
history [alias] - show event history for a loop (default: active loop)
clean [alias|--all] - remove loop files for a stopped/completed/failed loop (requires user confirmation, always confirm before deleting)

If no task and no active loop exists, ask user for task prompt.

Step 0: Load Context

Read these files if present:

.ai-factory/DESCRIPTION.md
.ai-factory/ARCHITECTURE.md
.ai-factory/RULES.md

Use them to keep outputs aligned with project conventions.

Read .ai-factory/skill-context/aif-loop/SKILL.md — MANDATORY if the file exists.

This file contains project-specific rules accumulated by /aif-evolve from patches, codebase conventions, and tech-stack analysis. These rules are tailored to the current project.

How to apply skill-context rules:

Treat them as project-level overrides for this skill's general instructions
When a skill-context rule conflicts with a general rule written in this SKILL.md, the skill-context rule wins (more specific context takes priority — same principle as nested CLAUDE.md files)
When there is no conflict, apply both: general rules from SKILL.md + project rules from skill-context
Do NOT ignore skill-context rules even if they seem to contradict this skill's defaults — they exist because the project's experience proved the default insufficient
CRITICAL: skill-context rules apply to ALL outputs of this skill — including the generated artifact, run state, and evaluation criteria. If a skill-context rule says "artifact MUST include X" or "evaluation MUST check Y" — you MUST comply. Producing loop outputs that violate skill-context rules is a bug.

Enforcement: After generating any output artifact, verify it against all skill-context rules. If any rule is violated — fix the output before presenting it to the user.

Step 0.1: Handle Non-Iteration Commands

If command is status, stop, list, history, or clean, execute and stop:

status: read current.json; if file exists, read pointed run.json and display alias | status | iteration | phase | current_step | last_score | updated_at; if file is missing, report that no loop is active
stop [reason]: stop active running loop only; set run.json.status = "stopped" and run.json.stop.reason = <reason or "user_stop">, append stopped event to history.jsonl, then delete current.json (active pointer cleared) and exit
list: scan .ai-factory/evolution/ directories, read each run.json, display table of alias | status | iteration | last_score | updated_at
history [alias]: read history.jsonl for the alias (or active loop), display formatted event timeline
clean [alias|--all]: show what will be deleted, ask for explicit user confirmation via AskUserQuestion, then delete loop directory. Only clean stopped/completed/failed loops — refuse to clean running loops. Update current.json if needed.

Step 1: Initialize or Resume Loop

1.1 Ensure directories

mkdir -p .ai-factory/evolution

1.2 Alias and IDs (new loop)

Generate:

task_alias: lowercase hyphen slug (3-64 chars)
run_id: <task_alias>-<yyyyMMdd-HHmmss>

1.3 Write `current.json`

{
  "active_run_id": "courses-api-ddd-20260218-120000",
  "task_alias": "courses-api-ddd",
  "status": "running",
  "updated_at": "2026-02-18T12:00:00Z"
}

1.4 Write initial `run.json`

{
  "run_id": "courses-api-ddd-20260218-120000",
  "task_alias": "courses-api-ddd",
  "status": "running",
  "iteration": 1,
  "max_iterations": 4,
  "phase": "A",
  "current_step": "PLAN",
  "task": {
    "prompt": "OpenAPI 3.1 spec + DDD notes + JSON examples",
    "ideal_result": "..."
  },
  "criteria": {
    "name": "loop_default_v1",
    "version": 1,
    "phase": {
      "A": { "threshold": 0.8, "active_levels": ["A"] },
      "B": { "threshold": 0.9, "active_levels": ["A", "B"] }
    },
    "rules": []
  },
  "plan": [],
  "prepared_checks": null,
  "evaluation": null,
  "critique": null,
  "stop": { "passed": false, "reason": "" },
  "last_score": 0,
  "stagnation_count": 0,
  "created_at": "2026-02-18T12:00:00Z",
  "updated_at": "2026-02-18T12:00:00Z"
}

1.5 Resume Logic

When resuming a loop:

Read run.json to get current_step and iteration
Read last event from history.jsonl to confirm consistency
If run.json.current_step indicates a phase was interrupted:
- Re-execute from that phase (do not skip)
- PRODUCE_PREPARE: always re-run both PRODUCE and PREPARE (idempotent — artifact overwrites, checks regenerate)
If run.json.status is stopped, completed, or failed, inform user and suggest new (for failed runs, also show the last phase_error event from history.jsonl so user understands what went wrong)

Step 2: Interactive Setup (new loop)

Quick mode (default, confirmation-first)

If the task prompt contains enough context to infer task type and ideal result:

Auto-detect task type from prompt (API spec, code, docs, config)
Load matching template from references/CRITERIA-TEMPLATES.md
Draft inferred rules, phase thresholds (fallback: A=0.8, B=0.9), and max iterations (default: 4)
Show inferred settings as a draft summary
Always ask explicit confirmation of success criteria (rules/thresholds) via AskUserQuestion, even if criteria were already present in the task text
Always ask explicit confirmation of max iterations via AskUserQuestion, even if iteration count was already present in the task text
If user changes either criteria or max iterations, update the draft and re-confirm both fields
Start iteration 1 only after both confirmations are explicit
If task type cannot be auto-detected (ambiguous or mixed prompt), fall through to full setup immediately

Full setup

Critical guardrail:

Always re-ask and explicitly confirm success criteria and max iterations, even if both are already written in the task prompt.

Ask concise setup questions before first iteration:

Task type - what kind of artifact? (API spec, code, docs, config, other) - used to load template from references/CRITERIA-TEMPLATES.md
Ideal result definition
Mandatory checks (tests, schema/contract, specific requirements)
Quality threshold (A/B phases)
Max iterations (default: 4)
What counts as a major issue
Explicit confirmation: "Confirm these success criteria?"
Explicit confirmation: "Confirm max iterations = N?"

Generate evaluation rules from answers:

Load matching template from references/CRITERIA-TEMPLATES.md as starting point
Add task-specific rules based on ideal result and mandatory checks
Let user review and adjust rules before starting

Persist answers and generated rules inside run.json.criteria (snapshot for reproducibility).

Never treat criteria or iteration limits parsed from task text as final until the user explicitly confirms both.

Normalization rules before persisting:

run.json.max_iterations is the single source of truth for iteration limit
every rule must be expanded to full RULE-SCHEMA format (id, description, severity, weight, phase, check)
if template shorthand omitted weight, derive from severity (fail=2, warn=1, info=0)

Step 3: Phase Contracts

Before running phases, load:

references/PHASE-CONTRACTS.md - strict I/O contracts for each phase
references/RULE-SCHEMA.md - rule format and score calculation

3.1 Phases

PLAN - generates iteration plan (sequential)
PRODUCE - generates artifact (parallel with PREPARE)
PREPARE - generates check scripts/definitions from rules + task prompt (parallel with PRODUCE)
EVALUATE - runs prepared checks + content rules, aggregates score (parallel check groups via Task)
CRITIQUE - identifies issues with fix instructions (sequential, only on fail)
REFINE - applies fixes to artifact (sequential, only on fail)

3.2 Parallel Execution Model

Two levels of parallelism via Task tool:

Inter-phase: PRODUCE and PREPARE run as parallel Task agents after PLAN completes. Both depend only on PLAN output.
Intra-phase: EVALUATE spawns parallel Task agents for independent check groups (executable checks via Bash, content rules via Read/Grep). Aggregates results into final score.

3.3 Phase Output Format

Each phase produces its defined output (see PHASE-CONTRACTS.md). No envelope wrapping. No router output.

Step 4: Iteration Execution

For each iteration:

Set run.json.current_step = "PLAN", run PLAN phase
Set run.json.current_step = "PRODUCE_PREPARE", launch both as parallel Task agents:
- Task A (PRODUCE): generates artifact → writes to artifact.md
- Task B (PREPARE): generates check scripts/definitions from rules + plan
- Wait for both to complete
Set run.json.current_step = "EVALUATE", run EVALUATE phase:
- Spawn parallel Task agents for independent check groups:
  - Executable checks (compile, lint, tests) → Task with Bash
  - Content rules (structure, completeness, style) → Task with Read/Grep
- Aggregate results into score
If passed=false:
- Set run.json.current_step = "CRITIQUE", run CRITIQUE phase
- Set run.json.current_step = "REFINE", run REFINE phase
- Write updated artifact to artifact.md
- Increment iteration and continue
If phase=A and passed=true:
- Switch to phase=B, activate B-level rules
- Set run.json.current_step = "PREPARE", re-run PREPARE with phase=B to materialize B-level checks (no PLAN/PRODUCE — artifact already passed A)
- Set run.json.current_step = "EVALUATE", run EVALUATE against the same artifact with B-level prepared checks
- If B evaluation also passes → stop with success (threshold_reached)
- If B evaluation fails → continue to CRITIQUE → REFINE, then increment iteration
If phase=B and passed=true:
- Stop with success (threshold_reached)

Fallback to Sequential

If Task tool is unavailable or returns errors, fall back to sequential execution: PLAN → PRODUCE → PREPARE → EVALUATE → CRITIQUE → REFINE. The loop must work without parallelism.

Step 5: Stop Conditions

Stop when any condition is met:

phase=B and passed=true (reason=threshold_reached)
no fail-severity rules failed in current evaluation (reason=no_major_issues) — even if score is below threshold, the artifact has no blocking issues and only warn/info remain
iteration >= run.max_iterations (reason=iteration_limit)
explicit user stop (reason=user_stop)
stagnation detected (reason=stagnation)

Stagnation rule

Track score progress:

delta = score - last_score
if delta < 0.02 and there are no severity fail blockers, increment stagnation_count
if stagnation_count >= 2, stop with stagnation

Step 6: Persistence Writes (every step)

After each phase output:

Update run.json (including current_step)
Append event to history.jsonl
Update current.json.updated_at
Write artifact.md to disk after PRODUCE and REFINE phases
Before REFINE overwrites artifact.md, save a SHA-256 hash of the previous artifact in the refinement_done event payload as "previous_artifact_hash" (enables integrity verification without bloating history)

Event names:

run_started
plan_created
artifact_created
checks_prepared
evaluation_done
critique_done
refinement_done
phase_switched
iteration_advanced
phase_error
stopped
failed

history.jsonl example line:

{"ts":"2026-02-18T12:01:10Z","run_id":"courses-api-ddd-20260218-120000","iteration":1,"phase":"A","step":"EVALUATE","event":"evaluation_done","status":"ok","payload":{"score":0.72,"passed":false}}

Step 7: Post-Loop

After the loop stops (any reason):

Display final state summary (iteration, max_iterations, phase, final score, stop reason)
If stop reason = iteration_limit and latest evaluation has passed=false, include mandatory distance-to-success details:
- active phase threshold and final score
- numeric gap to threshold (threshold - score, floor at 0)
- remaining failed fail-severity rule count + blocking rule IDs
- rules progress (passed_rules / total_rules)
Ask user where to save the final artifact (default: keep in .ai-factory/evolution/<alias>/artifact.md)
Offer to copy artifact to a user-specified path
Suggest next skills based on artifact type:
- API spec -> /aif-plan to implement it
- Code -> /aif-verify to check it
- Docs -> /aif-docs to integrate it
Update run.json.status based on stop reason, and if current.json points to this loop, delete current.json (no active loop remains):

Stop reason	Status
`threshold_reached`	`completed`
`no_major_issues`	`completed`
`user_stop`	`stopped`
`iteration_limit`	`stopped`
`stagnation`	`stopped`
`phase_error`	`failed`

Step 8: Response Format to User

Show a compact summary after each iteration — do NOT dump full run.json or artifact.md content into the conversation. The artifact is already on disk; duplicating it wastes context.

Iteration summary format

── Iteration {N}/{max} | Phase {A|B} | Score: {score} | {PASS|FAIL} ──
Plan: {1-line summary of plan focus}
Hash: {first 8 chars of artifact SHA-256}
Changed: {list of added/modified sections, or "initial generation"}
Failed: {comma-separated rule IDs, or "none"}
Warnings: {comma-separated rule IDs, or "none"}
Artifact: .ai-factory/evolution/<alias>/artifact.md

Hash — lets the user verify which version they're looking at without reading the full artifact
Changed — shows what actually moved between iterations so regressions are visible from the summary alone

If passed=false, append a compact critique summary (rule ID + 1-line fix instruction per issue). Do not repeat the full artifact or full evaluation object.

When the loop terminates with reason=iteration_limit and passed=false, append a compact distance_to_success block to the final response.

Full output exceptions

Show the full artifact content (not just summary) in these cases only:

Loop termination — the final iteration always outputs the complete artifact
Phase A → B transition — show the phase-A-passing artifact in full once at the transition boundary for visibility (B-level evaluation still runs immediately per Step 4)
Explicit user request — user asks to see the full artifact mid-loop

Step 9: Context Management

The loop generates significant context per iteration (subagent results, evaluation data, critique). After several iterations the conversation context grows large, degrading LLM quality.

All loop state is persisted to disk — clearing context loses nothing. The resume command fully reconstructs state from files.

When to recommend context clear

Recommend clearing context to the user in these situations:

After iteration 2 — the midpoint of a default 4-iteration loop
On Phase A → B transition — natural boundary, new evaluation scope begins
After any iteration where iteration >= 3 — context is already heavy

How to recommend

After the iteration summary, append:

💡 Context is growing. Recommended: /clear then /aif-loop resume
   All state is saved on disk — nothing will be lost.

Do not force or auto-clear. The user decides. If the user ignores the recommendation, continue normally.

Error Recovery

Invalid phase output

If a phase produces output that does not match its contract:

Log the error to history.jsonl with event phase_error
Retry the phase once with the same inputs
If retry also fails, stop the loop with reason=phase_error and display the error

Corrupted `run.json`

If run.json is missing or unparseable:

Read history.jsonl to reconstruct the last known state
Rebuild run.json from the most recent events (last iteration, phase, score, etc.)
If history.jsonl is also missing/empty, inform user and suggest starting a new loop

Important Rules

run.json is the only source of current state truth (does NOT store artifact content)
artifact.md on disk is the single source of truth for artifact content — never duplicate it in run.json
history.jsonl is append-only; do not edit old events
Keep loop fast: short plans, targeted critique, minimal rewrites
Do not create extra files beyond the 3+1 persistence files
Evaluator must remain strict and non-creative
Refiner changes only what is needed to pass failed rules
Start simple and add complexity only when metrics show need
Retry failed phases exactly once before stopping
Use compact iteration summaries by default (Step 8). Full artifact output is allowed only in Step 8 exceptions; never dump full run.json into conversation.
Recommend context clear at strategic points (Step 9) — after iteration 2, on phase transition, or when iteration >= 3

Examples

/aif-loop new OpenAPI 3.1 spec + DDD notes + JSON examples
/aif-loop resume
/aif-loop resume courses-api-ddd
/aif-loop status
/aif-loop stop
/aif-loop list
/aif-loop history
/aif-loop history courses-api-ddd
/aif-loop clean courses-api-ddd
/aif-loop clean --all