Agent Teams Workflow

When to Use

Use this skill when coordinating multiple Claude Code agents to implement features in parallel using the Agent Teams feature. Covers:

Feature doc format and stack-aware lifecycle
Three-role separation: test-writer, builder, reviewer (order depends on stack)
File ownership rules to prevent conflicts
Hook-based quality gates (TaskCompleted, TeammateIdle, Stop)
Fast verification for rapid feedback, full verification for completion gates
Progress dashboard (feature-docs/STATUS.md ) for zero-context recovery
Stuck detection and time blindness mitigation
Coordination protocol with kickoff prompts
Bootstrap and retrofit prompts for new and existing projects

Defer to other skills for:

git-workflow skill: Branch naming, commit message conventions, PR creation
testing-playwright skill: Frontend E2E test patterns (Playwright-specific)
testing-pytest skill: Python test patterns (pytest-specific)
testing-rust skill: Rust test patterns (cargo test-specific)

This workflow is adapted from Anthropic's "Building a C compiler with a team of parallel Claudes" (Feb 2026). The key insight: the quality of the testing harness determines the quality of the output.

Settings Configuration

Add to .claude/settings.json :

{ "$schema": "https://json.schemastore.org/claude-code-settings.json", "env": { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" }, "teammateMode": "tmux" }

Setting Values What it does

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS

"1"

Enables the agent teams feature

teammateMode

"auto" , "tmux" , "in-process"

Controls how teammates are displayed

Display modes:

auto (default) — uses split panes if already in tmux, in-process otherwise
tmux — forces split-pane mode; each teammate gets its own tmux pane
in-process — all teammates share the main terminal; use Shift+Down to cycle between them

Override per-session: claude --teammate-mode in-process

Core Principles

Verification Oracle (Stack-Dependent)

The workflow uses different verification strategies depending on the stack:

Python/Rust — Tests as Oracle (TDD): The test-writer agent reads feature docs and writes failing tests. The builder agent implements code to make those tests pass. Nobody grades their own homework — the agent that writes tests never writes implementation, and the agent that implements never modifies tests.

Frontend — Interface as Oracle (Build-First): For frontend projects, the user-visible interface is the stable contract — not internal component APIs. The builder implements directly from the feature doc's acceptance criteria. The test-writer then writes Playwright E2E tests that verify the implementation matches the spec. Tests should PASS (not fail). Vibe-coded UIs change constantly — components get restructured, hooks get refactored, state management evolves. Unit tests against internal APIs break with every refactor. But the user-facing behavior (what they click, what they see) is stable. E2E tests verify that stable contract.

In both models, separation of concerns is preserved: the agent that builds never writes tests, and the agent that writes tests never modifies implementation.

Minimal Context Pollution

LLMs degrade as context fills with irrelevant information. Every hook and agent instruction is designed to produce minimal, structured output:

Test results print summary lines only, not full stack traces
Errors use a consistent format: ERROR [CATEGORY]: one-line description
Verbose output goes to agent_logs/ , never to stdout
scripts/verify.sh logs full output to agent_logs/ and pipes through tail -10
Agent test commands use quiet reporters (-q , no --reporter=verbose )
Stop hook truncates output to 20 lines

Fast Verification

The Stop hook runs scripts/fast-verify.sh (type check only) on every response where files changed. This catches type errors quickly without running the full suite. The full verify pipeline (scripts/verify.sh ) runs only on TaskCompleted.

This mirrors Carlini's --fast mode: quick smoke checks during work, comprehensive validation only at completion gates.

Time Blindness Mitigation

LLMs cannot self-regulate time. The TeammateIdle hook detects features stuck in building/ for over 30 minutes (using file modification time) and warns the user. This prevents agents from spinning indefinitely on hard problems.

Progress Dashboard

Agents start each session with zero context. feature-docs/STATUS.md is updated by every agent after each stage transition. It shows what's in flight, what's blocked, and what's done — enabling any agent to orient quickly.

File Ownership

Feature docs declare which files each feature affects. No agent touches files owned by another in-progress feature. This prevents the problem Carlini identified: agents hitting the same bug, fixing it, and overwriting each other's changes.

Ownership is convention-based (declared in feature doc frontmatter), not technically enforced. Agents must check feature-docs/testing/ and feature-docs/building/ for overlapping affected-files before starting work.

CI as Regression Gate

The TaskCompleted hook runs the full verify pipeline (scripts/verify.sh ) before any task can be marked done. An agent cannot ship code that breaks existing tests. This is enforced at the hook level (deterministic) rather than in prompts (probabilistic).

Human-in-the-Loop for Subjective Work

Tests verify functional correctness, but some decisions are subjective. For frontend projects, visual/style work requires human review loops with screenshots. The workflow splits into:

Feature work: Fully autonomous. Human writes spec, agents handle the pipeline (frontend: build → E2E test → review, Python/Rust: test → build → review).
Style work (frontend only): Human-in-the-loop. Agent makes changes, generates screenshots, pauses for human feedback. Approved screenshots become visual regression baselines.

Team Lifecycle

Step 1 — Create a Team

One team per feature or work unit. Creates config at ~/.claude/teams/{team-name}/

and task list at ~/.claude/tasks/{team-name}/ .

TeamCreate { team_name: "feat-user-auth" }

Step 2 — Spawn Teammates

Use the Agent tool with team_name to add teammates. Each spawned teammate appears in its own tmux pane automatically.

Agent { team_name: "feat-user-auth", name: "test-writer", subagent_type: "test-writer", prompt: "Pick up feature-docs/ready/003-user-auth.md", mode: "auto" }

Parameter Required Purpose

team_name

Yes Which team this teammate joins

name

Yes Human-readable name for messaging and task assignment

subagent_type

Yes Agent type — custom agents from .claude/agents/ or built-in types

prompt

Yes The task description / instructions

mode

No Permission mode ("auto" for autonomous, "plan" for approval)

Step 3 — Coordinate with Tasks

Create structured work items that teammates can claim and track:

TaskCreate { subject: "Write failing tests for auth module", description: "Read feature doc acceptance criteria, write pytest tests..." }

Assign and track:

TaskUpdate { taskId: "1", owner: "test-writer", status: "in_progress" } TaskUpdate { taskId: "2", addBlockedBy: ["1"] }

Step 4 — Communicate

Send direct messages to teammates:

SendMessage { type: "message", recipient: "test-writer", content: "Tests look good. Moving to builder phase.", summary: "Tests approved" }

Broadcast to all (use sparingly — costs scale with team size):

SendMessage { type: "broadcast", content: "Blocking issue found — stop all work.", summary: "Critical blocker found" }

Step 5 — Shut Down and Clean Up

Gracefully terminate each teammate, then delete the team:

SendMessage { type: "shutdown_request", recipient: "test-writer", content: "All tasks complete, shutting down." }

After all teammates have shut down:

TeamDelete {}

Ideation Phase (Pre-Ready)

Before a feature enters the agent pipeline, it goes through an ideation phase where the human explores, researches, and shapes the idea. Source feature-docs/new-feature.md

to start (or resume) the guided workflow.

Ideation happens in feature-docs/ideation/ with one subfolder per feature:

feature-docs/ideation/ CLAUDE.md # Auto-discovered guide for all ideation folders 001-user-auth/ README.md # Status tracking + progress log code-review.md # Analysis of existing code to change api-research.md # How other projects solve this design-notes.md # Data flow, component tree, schema spike-results.md # Quick experiments 002-cart-redesign/ README.md current-analysis.md competitor-notes.md

Starting or Resuming Ideation

Source feature-docs/new-feature.md — it handles both cases:

New feature: Asks what you want to build, creates the ideation folder, walks you through validation (code review, research, design), saves artifacts as you go
Resume: Scans for folders with status: in-progress , reads all artifacts, summarises where you left off, continues from open questions

Status Tracking

Each ideation folder's README.md has YAML frontmatter:

feature: user-auth status: in-progress # or: complete, shipped created: 2025-01-15

The ## Progress section tracks dated entries across sessions:

2025-01-15 — Initial exploration

What we did: Reviewed existing auth code, identified session management gap
Decisions made: Use httpOnly cookies, not localStorage
Open questions: Which OAuth provider to use later?

2025-01-16 — API design

What we did: Designed login/logout endpoints, drafted store structure
Decisions made: Separate auth store from user profile store
Open questions: How to handle token refresh?

What Goes in an Ideation Folder

There are no format rules — use whatever helps you think:

Code reviews — Analysis of existing code the feature will touch
Research notes — API docs, how other projects solve this, trade-offs
Design sketches — Data flow diagrams, component trees, schema changes
Spike results — Quick experiments to validate an approach
Conversation logs — Key decisions and reasoning from Claude sessions

Distilling into a Feature Doc

When the feature is clear enough to write testable acceptance criteria, say "create the feature" during your ideation session. The prompt will:

Read all files in the ideation folder
Synthesise the summary from across all artifacts
Extract testable behaviours as GIVEN/WHEN/THEN acceptance criteria
Identify affected files from code reviews and design notes
Flag gaps (missing error cases, unresolved decisions, no affected files)
Save the final doc to feature-docs/ready/<feature-name>.md
Set ideation-ref in the feature doc frontmatter pointing back to the ideation folder
Update the ideation README status to complete

The ideation folder stays as an archive. Agents never read ideation folders — only the distilled feature doc in ready/ . The ideation-ref field lets agents optionally check the ideation folder for additional context.

When the feature later completes the full pipeline (reviewer approves, doc moves to completed/ ), the coordinator updates the ideation README status from complete to shipped and appends a final progress entry noting pipeline completion. This is handled by the coordinator's "After reviewer approves" checklist in implement-feature.md .

Alternatively, if you already know what you want and want to skip ideation, source feature-docs/new-feature.md and choose "skip to feature doc" when prompted — it handles both paths (ideation and direct creation) from a single entry point.

Feature Doc Format

Feature docs live in feature-docs/ with subdirectories for each lifecycle stage. Create this directory structure in your project:

feature-docs/ ideation/ # Human explores and shapes ideas here ready/ # Distilled feature doc goes here testing/ # Test-writer moves doc here building/ # Builder moves doc here review/ # Builder moves doc here when tests pass completed/ # Reviewer moves doc here when done

Template

title: User Authentication status: ready priority: high depends-on: 004-session-management affected-files:

src/auth/authenticate.ts
src/auth/session.ts
src/stores/auth-store.ts
src/components/login-form.tsx

User Authentication

Summary

Add email/password login with session management. Users can log in, stay authenticated across page reloads, and log out.

Acceptance Criteria

GIVEN a valid email and password WHEN authenticate(email, password) is called THEN it returns a Session with a non-null token and expiresAt > now
GIVEN an email with no matching user WHEN authenticate(email, password) is called THEN it throws AuthenticationError with code "INVALID_CREDENTIALS"
GIVEN authStore.getState().isAuthenticated is true WHEN logout() is called THEN authStore.getState().session is null and the session cookie is cleared
GIVEN a session cookie with a valid token WHEN restoreSession() is called THEN authStore.getState().isAuthenticated is true
GIVEN a session cookie with an expired token WHEN restoreSession() is called THEN authStore.getState().session is null and the cookie is cleared

Edge Cases

Empty email or password to authenticate() — throws ValidationError with code "EMPTY_FIELD" before any network request
Session cookie with malformed JSON — restoreSession() clears the cookie silently without throwing

Out of Scope

OAuth/social login (separate feature) — do NOT add OAuth types to Session
Do NOT touch src/api/client.ts interceptor (has a TODO: add auth comment; leave as-is to avoid breaking existing API calls)

Technical Notes

Session token uses httpOnly cookie, not localStorage
Rejected: localStorage with encryption wrapper — XSS-accessible, no real protection. httpOnly cookies are invisible to JS entirely.

Acceptance Criteria Rules

Every acceptance criterion must be:

Testable — can be verified by an automated test
Specific — names exact functions, fields, error types, and return values
Independent — does not depend on other criteria passing first
Complete — covers the happy path, error cases, and edge cases

Vague criteria produce vague tests produce wrong implementations.

Feature Dependencies

Features can declare a dependency on one other feature using the depends-on

frontmatter field. The value is the filename stem of the dependency (e.g., 005-user-auth ).

One level per doc: Each feature declares only its immediate parent. Feature 006 says depends-on: 005-session-mgmt . Feature 005 says depends-on: 004-data-layer . The full chain (006 → 005 → 004) is resolved dynamically at check time — no feature stores the entire chain.

Recursive resolution: The scripts/check-deps.sh script walks the chain from the target feature all the way down. If ANY dependency in the chain is not in completed/ , the feature is BLOCKED and must not be picked up.

Blocking behavior:

In TeammateIdle hooks: blocked features are skipped. The hook continues searching for unblocked work.
In agent pickup (builder/test-writer): agents check dependencies before starting. If blocked, they report to the user and stop.
In implement-feature.md coordinator flow: the pre-flight check warns the user and asks whether to wait or override.

Circular dependency detection: The script tracks visited features and exits with an error if a cycle is found (e.g., A → B → A).

When to use depends-on :

Feature B cannot function without Feature A's code being merged (runtime dependency)
Feature B's acceptance criteria reference outputs from Feature A
Feature B modifies files that Feature A creates (sequential file ownership)

When NOT to use depends-on :

Features that merely share a domain but are independently testable
Priority ordering (use priority: high/medium/low instead)
Features that could run in parallel with non-overlapping files

Vague (agent has to guess) Precise (agent can write a test)

THEN the login works THEN authenticate() returns a Session with non-null token

THEN an error is shown THEN it throws AuthenticationError with code "INVALID_CREDENTIALS"

THEN the data is saved THEN authStore.getState().session contains the Session

THEN the field is removed THEN the returned object does NOT include a legacyField key

Agent Roles

Test Writer

Purpose: Produce tests that verify the feature doc's acceptance criteria.

Frontend (build-first):

Reads: Feature doc from feature-docs/testing/

builder's implementation

Produces: Playwright E2E tests that PASS — no Vitest unit tests
Tests verify the user-visible interface against acceptance criteria
If a test fails, the builder has a bug (report it, don't work around it)
Moves doc: testing/ → review/

Python/Rust (TDD):

Reads: Feature doc from feature-docs/ready/
Produces: Test files that FAIL (all tests must fail before handing off)
Tests import from implementation paths even though files may not exist yet
Moves doc: ready/ → testing/

Shared constraints:

Never writes implementation code — only test files
Each acceptance criterion produces at least one test
Edge cases from the feature doc produce additional tests
Commits tests with test(<scope>): add [failing] tests for <feature-name>

Builder

Purpose: Write implementation code for the feature.

Frontend (build-first):

Reads: Feature doc from feature-docs/ready/
Produces: Implementation code directly from acceptance criteria
Creates the feature branch
Moves doc: ready/ → building/ → testing/

Python/Rust (TDD):

Reads: Feature doc from feature-docs/testing/ , failing test files
Produces: Implementation code that makes all tests pass
Moves doc: testing/ → building/ → review/

Shared constraints:

Never modifies test files — if tests are wrong, stop and report to the user
Must run scripts/verify.sh after implementation
Only touches files listed in the feature doc's affected-files
Commits implementation with feat(<scope>): implement <feature-name>

Reviewer

Purpose: Catch what tests cannot — code quality, convention adherence, design system consistency, and qualitative issues.

Maps to: The existing code-reviewer universal agent, extended with agent-teams awareness.

Checks:

Code follows project conventions (CLAUDE.md rules)
No duplicate logic introduced
Error handling is complete
Types are correct and specific (no any , no unwrap in production paths)
Component library used correctly (shadcn for frontend, idiomatic patterns for backend)
Feature doc acceptance criteria all have corresponding tests
Tests actually validate the criteria (not just trivially passing)

Produces: Review report. If issues found, status stays at review . If approved, reviewer moves doc to feature-docs/completed/ .

Constraints:

Strictly read-only — never edits implementation or test files
Never uses Bash to modify files (sed -i , echo > , etc.)
Reports issues to the coordinator; the coordinator routes fixes to the appropriate agent
Independence is the reviewer's value — if the reviewer fixes code, it cannot objectively review it

Coordinator

Purpose: Orchestrate the pipeline — scan for work, run pre-flight checks, invoke agents, verify lifecycle compliance between stages, and manage the progress dashboard. The coordinator never writes implementation or test code.

Identity: The main Claude Code session that sources implement-feature.md . Unlike the other roles, the coordinator is not a named agent with restricted tools — it has full tool access by default. These constraints are self-imposed through prompt instructions.

Reads: Feature docs (all directories), STATUS.md, verify output, agent reports

Produces: Team lifecycle management, feature doc lifecycle moves, STATUS.md updates

Allowed operations:

Read, Grep, Glob, and read-only Bash on any file
TeamCreate , Agent , SendMessage , TeamDelete for team lifecycle
TaskCreate /TaskUpdate for tracking work items
sed on feature doc frontmatter (status: field only)
mv to move feature docs between lifecycle directories
Write/Edit on feature-docs/STATUS.md only

Constraints:

Never uses Write, Edit, or sed on files listed in affected-files
Never uses Write, Edit, or sed on test files
Never uses Write, Edit, or sed on any implementation/source file
When code needs fixing, re-invokes the responsible agent with specific error details
When tests are wrong, reports to the user or re-invokes the test-writer

Feature Doc Lifecycle

Frontend (Build-First)

Human explores idea → (feature-docs/ideation/<name>/) └─ Code reviews, research, design notes, spikes Human distills doc → status: ready (feature-docs/ready/) Builder picks up → status: building (feature-docs/building/) └─ Implements from acceptance criteria on feature branch Builder finishes → status: testing (feature-docs/testing/) └─ All verification passes, implementation complete Test-writer picks up → status: testing (feature-docs/testing/) └─ Writes passing Playwright E2E tests Test-writer finishes → status: review (feature-docs/review/) └─ E2E tests pass, verify clean Reviewer validates → status: done (feature-docs/completed/) └─ Approved by reviewer Coordinator merges → PR created and merged to main └─ Returns to main, ready for next feature

Python/Rust (TDD)

Human explores idea → (feature-docs/ideation/<name>/) └─ Code reviews, research, design notes, spikes Human distills doc → status: ready (feature-docs/ready/) Test-writer picks up → status: testing (feature-docs/testing/) └─ Failing tests committed on feature branch Builder picks up → status: building (feature-docs/building/) └─ Implements until all tests pass Builder finishes → status: review (feature-docs/review/) └─ All tests + verify pass Reviewer validates → status: done (feature-docs/completed/) └─ Approved by reviewer Coordinator merges → PR created and merged to main └─ Returns to main, ready for next feature

Status Transitions

Frontend (Build-First):

From To Who Action

ready building builder Move doc, create branch, implement from spec

building testing builder Move doc, verify passes, implementation done

testing review test-writer Move doc, E2E tests written and passing

review completed reviewer Move doc, approve quality

review testing reviewer Move doc back, E2E test gaps found

review building reviewer Move doc back, implementation issues found

Python/Rust (TDD):

From To Who Action

ready testing test-writer Move doc, write failing tests, commit

testing building builder Move doc, begin implementation

building testing builder BOUNCE: defective tests, create bounce file

building review builder Move doc, all tests pass, verify clean

review completed reviewer Move doc, approve quality

review building reviewer Move doc back, issues found (re-work)

The status field in the feature doc frontmatter and the directory location must stay in sync. Moving the file IS the status transition.

Branch Strategy

Each feature gets its own branch: feat/<feature-name> (following git-workflow skill conventions).

The first agent checks out main and pulls before creating the branch
All agents commit on the same branch
Reviewer reviews the branch
After reviewer approval, the coordinator creates a PR (gh pr create ) and merges it (gh pr merge --squash --delete-branch )
The coordinator returns to main (git checkout main && git pull ) before the next feature starts
This ensures each new feature branches from the latest main, not from a previous unmerged feature

Naming Convention

Feature doc filenames use a 3-digit numeric prefix: NNN-feature-name.md

(e.g., 001-user-auth.md , 002-cart-redesign.md ). The prefix is assigned at creation time by running scripts/next-feature-number.sh , which scans all lifecycle directories and ideation folders for existing prefixes and returns the next available number. Ideation folders use the same prefix (e.g., ideation/001-user-auth/ ). The numeric prefix carries through the entire lifecycle — the same file that starts as ready/001-user-auth.md becomes testing/001-user-auth.md , then building/ , review/ , and completed/ .

This prevents confusion between similarly-named features. 001-user-auth.md

can never be mistaken for 002-user-auth-v2.md .

Coordination Protocol

Automated Kickoff

Source feature-docs/implement-feature.md to scan ready/ for available features, run pre-flight checks (section completeness, file ownership conflicts, dependency chain), detect the stack, and kick off the first agent (builder for frontend, test-writer for Python/Rust). The TeammateIdle hook handles subsequent handoffs automatically.

Dependency awareness: Before kicking off any feature, the coordinator checks its dependency chain via scripts/check-deps.sh . If the feature has unmet dependencies, the coordinator warns the user and suggests waiting or proceeding with an override. The TeammateIdle hook automatically skips blocked features when scanning for pending work.

Sequential Pipeline — Frontend (Build-First)

1. Create team

TeamCreate { team_name: "feat-user-auth" }

2. Spawn builder

Agent { team_name: "feat-user-auth", name: "builder", subagent_type: "builder", prompt: "Pick up feature-docs/ready/001-user-auth.md", mode: "auto" }

3. Wait for builder to finish (TeammateIdle notification)

4. Shut down builder

SendMessage { type: "shutdown_request", recipient: "builder" }

5. Spawn test-writer for E2E tests

Agent { team_name: "feat-user-auth", name: "test-writer", subagent_type: "test-writer", prompt: "Pick up feature-docs/testing/001-user-auth.md", mode: "auto" }

6. Wait for test-writer to finish

7. Shut down test-writer, spawn reviewer

SendMessage { type: "shutdown_request", recipient: "test-writer" }

Agent { team_name: "feat-user-auth", name: "reviewer", subagent_type: "code-reviewer", prompt: "Review feature-docs/review/001-user-auth.md", mode: "auto" }

8. Wait for reviewer, then merge and clean up

Create PR and merge to main

gh pr create --base main --head "feat/user-auth" --title "feat(auth): user authentication" --body "..." gh pr merge --squash --delete-branch

Return to main

git checkout main git pull origin main

Clean up the team

SendMessage { type: "shutdown_request", recipient: "reviewer" } TeamDelete {}

Sequential Pipeline — Python/Rust (TDD)

1. Create team

TeamCreate { team_name: "feat-config" }

2. Spawn test-writer (writes failing tests first)

Agent { team_name: "feat-config", name: "test-writer", subagent_type: "test-writer", prompt: "Pick up feature-docs/ready/003-config.md", mode: "auto" }

3. Wait for test-writer to finish (TeammateIdle notification)

4. Shut down test-writer

SendMessage { type: "shutdown_request", recipient: "test-writer" }

5. Spawn builder

Agent { team_name: "feat-config", name: "builder", subagent_type: "builder", prompt: "Pick up feature-docs/testing/003-config.md", mode: "auto" }

6. Wait for builder to finish

7. Shut down builder, spawn reviewer

SendMessage { type: "shutdown_request", recipient: "builder" }

Agent { team_name: "feat-config", name: "reviewer", subagent_type: "code-reviewer", prompt: "Review feature-docs/review/003-config.md", mode: "auto" }

8. Wait for reviewer, then merge and clean up

Create PR and merge to main

gh pr create --base main --head "feat/config" --title "feat(config): configuration system" --body "..." gh pr merge --squash --delete-branch

Return to main

git checkout main git pull origin main

Clean up the team

SendMessage { type: "shutdown_request", recipient: "reviewer" } TeamDelete {}

Python/Rust: Test Bounce-Back (builder → test-writer → builder)

If the builder detects defective tests (wrong assertions, missing pytest.raises , tests that contradict the feature doc), it moves the feature doc back to testing/ , creates a bounce file (<name>.bounce.md ), and exits. The coordinator detects this and re-invokes the test-writer in fix mode.

Detection: After the builder finishes (TeammateIdle or manual check), check whether it bounced — the feature doc will be in testing/ (not review/ ):

ls feature-docs/testing/<filename>.bounce.md

If a bounce file exists:

Check bounce count: Read the bounce-count from the feature doc frontmatter. If it is 3 or higher, escalate to the user — the problem is likely in the acceptance criteria, not test mechanics.

Re-invoke the test-writer in fix mode:

Agent { team_name: "feat-<feature-name>", name: "test-writer", subagent_type: "test-writer", prompt: "Fix defective tests per feature-docs/testing/<filename>.bounce.md", mode: "auto" }

Wait for the test-writer to complete, then re-invoke the builder:

SendMessage { type: "shutdown_request", recipient: "test-writer" }

Agent { team_name: "feat-<feature-name>", name: "builder", subagent_type: "builder", prompt: "Pick up feature-docs/testing/<filename>.md — tests have been fixed after bounce-back.", mode: "auto" }

Circuit breaker: When bounce-count reaches 3, escalate to the user. Do not re-invoke agents automatically — the issue likely requires revising the feature doc's acceptance criteria.

Concurrency Rules

Same-role parallelism is allowed. The coordinator may launch multiple Agent

calls simultaneously, each working on a different piece or a different feature.

Cross-role parallelism is forbidden. Builders and testers must never run at the same time. Complete ALL agents of one role before starting the next role.
Clean shutdown between roles. Send shutdown_request to each teammate and verify all agents of the current role have fully stopped before spawning the next role. Teammates finish their current turn before exiting.

Parallel Workflow (Multiple Features)

For multiple features in parallel, ensure no affected-files overlap. Use the stack-appropriate first agent (builder for frontend, test-writer for Python/Rust). If features share files, run them sequentially to avoid conflicts.

Parallel Investigation

Spawn multiple teammates to explore in parallel:

TeamCreate { team_name: "investigate-perf" }

Agent { team_name: "investigate-perf", name: "db-investigator", subagent_type: "general-purpose", prompt: "Investigate database query performance in src/db/", mode: "auto" }

Agent { team_name: "investigate-perf", name: "api-investigator", subagent_type: "general-purpose", prompt: "Investigate API endpoint latency in src/api/", mode: "auto" }

TeammateIdle Hook

When a teammate finishes work and goes idle, the TeammateIdle hook scans feature-docs/ for pending work and logs what it finds. The hook always exits 0, allowing the agent session to terminate cleanly. The coordinator is responsible for launching fresh agent sessions for the next role.

Frontend (build-first) scan priority:

feature-docs/testing/ — Needs test-writer for E2E tests
feature-docs/ready/ — Needs builder to implement
feature-docs/review/ — Needs reviewer

Python/Rust (TDD) scan priority:

feature-docs/testing/ — Failing tests exist, needs a builder
feature-docs/ready/ — Feature doc waiting, needs a test-writer
feature-docs/review/ — Implementation done, needs a reviewer

The hook logs pending work to stderr for the coordinator's awareness, but does not redirect the idle agent. This prevents finished agents from lingering and interfering with the next role's file changes.

TaskCompleted Hook

When any teammate tries to mark a task as done, the TaskCompleted hook runs two checks:

Lifecycle compliance — Scans all feature docs in ready/ , testing/ , building/ , review/ , and completed/ . For each doc with a status: field, verifies the value matches the directory name. If any feature doc is in the wrong directory (e.g., still in ready/ when it should be in testing/ ), the task is blocked. This prevents agents from skipping the doc-move step.
Full verify pipeline:

Type checking (tsc / mypy / cargo check)
Linting (eslint / ruff / clippy)
Tests (vitest / pytest / cargo test)

If either check fails, the task cannot be marked done. The agent sees the error output and must fix the issue before trying again.

File Ownership Rules

Claiming Files

When an agent picks up a feature doc, the affected-files list in the frontmatter declares which files that agent may modify. Before starting:

Read all feature docs in feature-docs/testing/ and feature-docs/building/
Collect their affected-files lists
Check for overlap with the current feature's affected-files
If overlap exists, report to the user and wait — do not proceed

Resolving Conflicts

If two features must touch the same file:

Run them sequentially (feature A completes fully before feature B starts)
Or split the shared file into separate modules first

Test File Ownership

Test files are owned exclusively by the test-writer. The builder must never modify them.

Python/Rust: If a test is wrong, the builder creates a bounce file (<name>.bounce.md ) in feature-docs/testing/ describing the defects, moves the feature doc back to testing/ , and stops. The coordinator re-invokes the test-writer in fix mode. The builder never modifies test files or writes production code to accommodate a defective test.

Frontend: E2E test files are created by the test-writer after the builder finishes. The builder has no test files to modify.

Style Work (Frontend Only)

Style refinement cannot be fully automated because "looks right" is subjective.

Style Doc Format

Style docs follow the same template as feature docs but live in styles/

instead of feature-docs/ :

title: Dashboard Cards Redesign status: ready affected-files:

src/components/dashboard/stat-card.tsx
src/components/dashboard/chart-card.tsx

Dashboard Cards Redesign

Visual Direction

Cards should use subtle shadows instead of borders
Stat numbers should use the display font at 2xl
Charts should fill the card width with 16px padding

Reference

See designs in figma: [link]
Similar to the pattern in src/components/existing-card.tsx

Iteration Loop

Human writes a style doc with visual direction
Style agent applies changes and generates screenshots to styles/reviews/<name>/iteration-N/
Agent sets status to awaiting-review and stops
Human reviews screenshots, writes feedback in the style doc
Agent reads feedback, applies another iteration
When human approves, screenshots become Playwright visual regression baselines

Approved screenshots are locked in as automated tests. Future agents cannot drift from the approved design without failing a visual regression test.

Hook Configuration

TaskCompleted

Blocks task completion until lifecycle compliance and the full verify pipeline pass.

{ "event": "TaskCompleted", "command": "bash scripts/task-completed.sh" }

The script runs two checks. First, it scans feature docs for status/directory mismatches (e.g., a doc in ready/ with status: testing ) and blocks if any are found. Second, it runs scripts/verify.sh (full pipeline) and blocks (exit 2) on any failure. Output is truncated to 30 lines to avoid context pollution. Verbose logs are available in agent_logs/ for debugging.

Lifecycle-aware: For Python/Rust during the testing stage, only lifecycle compliance is checked — the verify pipeline is skipped because tests are expected to fail. For frontend, all stages run full verification (no stage has expected failures in the build-first flow).

TeammateIdle

Logs pending work for the coordinator's awareness when a teammate goes idle.

{ "event": "TeammateIdle", "command": "bash scripts/teammate-idle.sh" }

The script first checks for stuck features (in building/ for over 30 minutes) and warns if found. Then it scans feature-docs/ directories and logs any pending work to stderr. Always exits 0 to let the agent session terminate cleanly — the coordinator launches fresh sessions for the next role.

Stop (Fast Verify on Change)

Runs fast verification (type check only) after each Claude response when files have changed. Full verification is deferred to TaskCompleted to avoid spending agent time on the full suite during iterative development.

{ "event": "Stop", "command": "bash scripts/stop-hook.sh" }

The script checks git diff and git ls-files for modifications. If the working tree is clean, it exits 0 (skips verify). If files have changed, it runs scripts/fast-verify.sh (type check only) for quick feedback. If no fast-verify script exists, it falls back to scripts/verify.sh . It reads stop_hook_active

from stdin to prevent recursive loops. Output is truncated to 20 lines.

Lifecycle-aware: For Python/Rust during the testing stage, verification is skipped entirely because test-writer code references unimplemented APIs that will always fail type checking. For frontend, verification runs at all stages.

Branch Protection

The guard-bash.sh PreToolUse hook blocks direct commits on main /master , forcing agents to work on feature branches. This complements the branch-per-feature strategy described in the coordination protocol.

Interaction Controls

tmux Mode

Click into any teammate's pane to interact directly
Each pane shows the teammate's full terminal session
Standard tmux controls for pane management

in-process Mode

Shift+Down — cycle through active teammates
Enter — view a teammate's full session
Escape — interrupt current turn
Ctrl+T — toggle task list view
Type to send messages to the currently visible teammate

Bootstrap Prompt (New Project)

Use this prompt to set up the agent teams workflow in a new project:

Set up the agent teams workflow for this project:

Create the feature-docs/ directory structure: feature-docs/ideation/, feature-docs/ready/, feature-docs/testing/, feature-docs/building/, feature-docs/review/, feature-docs/completed/
Create an agent_logs/ directory for verbose output Add agent_logs/ to .gitignore
Verify that scripts/verify.sh and scripts/fast-verify.sh both exist:
- verify.sh: full pipeline (type check + lint + tests) with output to agent_logs/
- fast-verify.sh: type check only for quick feedback
Verify that .claude/settings.json includes TaskCompleted, TeammateIdle, and Stop hooks
Create a sample feature doc in feature-docs/ready/ based on the Feature Doc Format section in feature-docs/CLAUDE.md
Create an empty feature-docs/STATUS.md for the progress dashboard
Run the full verify pipeline once to confirm everything works

Report what you created and any issues found.

Retrofit Prompt (Existing Project)

Use this prompt to add the workflow to a project that already has code and tests:

Retrofit the agent teams workflow into this existing project:

Discovery — report the following:
- Package manager and framework
- Test runner and test directory structure
- Component library and state management
- Directory structure and naming conventions
- Existing .claude/ configuration
Create the feature-docs/ directory structure alongside existing code
Verify scripts/verify.sh works with the existing toolchain:
- Type checking command
- Lint command
- Test command
Check .claude/settings.json for existing hooks and add TaskCompleted and TeammateIdle hooks without replacing existing configuration
Identify migration needs:
- Test files not in a separate directory (need restructuring?)
- Missing test coverage for critical paths
- Files without clear ownership boundaries

Write a discovery report to agent_logs/discovery-report.md and list any recommended changes (without acting on them).

Token Cost Expectations

Agent teams use roughly 5x the tokens of a single session per teammate. A team of 3 (test-writer, builder, reviewer) working on a single feature uses approximately 15x a normal session's tokens. This is justified when:

The feature has clear, testable acceptance criteria
Files can be cleanly owned by one feature at a time
Quality gates (hooks) prevent wasted rework
The alternative is sequential context degradation in a single long session

For simple features (one file, clear spec), use a single Claude Code session. Reserve agent teams for features touching multiple files across stores, components, services, and tests.

Limitations

One team per session — a lead can only manage one team at a time
No nested teams — teammates cannot spawn their own teams
No session resumption for in-process teammates
Higher token costs than single sessions (each teammate has its own context)
Split panes require tmux or iTerm2 with it2 CLI
Shutdown can be slow — teammates finish their current turn before exiting

Anti-Patterns

Anti-Pattern Why It Fails Fix

Builder modifies test files Grading your own homework — tests lose independence as the oracle Builder must never touch files created by test-writer

Builder works around defective tests Production code is contorted to satisfy wrong assertions — e.g., returning error strings instead of raising exceptions because the test lacks pytest.raises

Builder runs Test Quality Audit before implementation; if tests are defective, STOP and create a bounce file — never write code to accommodate a bad test

Builder writes code to satisfy weak assertions A test asserts truthiness (is not None ) instead of specific values; builder writes a minimal stub that returns a placeholder Builder's bright-line rule: if idiomatic code written without seeing the tests would not satisfy the assertion, the test is defective — bounce back

Skipping the test-writer step No independent verification — builder's code is unchecked against the spec Frontend: test-writer writes E2E tests after build. Python/Rust: test-writer writes failing tests before build

No file ownership declaration Two agents edit the same file; merge conflicts and lost work Feature docs must list affected-files ; check for overlaps

Running parallel features on same branch Merge conflicts, unclear ownership, broken bisect history One branch per feature; merge to main sequentially

Passing full test output to agents Context pollution fills the window with stack traces Pass summary only: X passed, Y failed, first failure message

Feature doc without testable criteria Test-writer cannot produce meaningful tests; builder has no target Every acceptance criterion must use GIVEN/WHEN/THEN format

Skipping the reviewer step Qualitative issues (conventions, duplication, design) go undetected Reviewer validates what tests cannot catch

Using agent teams for trivial changes 15x token cost for a one-line fix is wasteful Single session for changes touching fewer than 3 files

Running full test suite on every save Agent wastes time waiting for slow tests during iteration Use fast-verify.sh (type check only) on Stop; full suite on TaskCompleted

Tests that check truthiness not values Wrong implementation passes — toBeTruthy() accepts any non-null Assert specific return values, error types, and state changes

No progress dashboard Agents start with zero context and waste time re-discovering state Update feature-docs/STATUS.md after every stage transition

Ignoring stuck features Agent spins for hours on a hard problem without human awareness TeammateIdle warns after 30 minutes in building/; check agent_logs/

Skipping feature doc lifecycle steps Next agent never finds the feature doc; pipeline stalls indefinitely task-completed.sh enforces status/directory sync; Completion Gate checklist in agent definitions

Coordinator edits implementation or test files Violates role separation — coordinator and agent edit the same files, causing conflicts and undermining the test-as-oracle principle Coordinator re-invokes the responsible agent with specific error details; never uses Write/Edit/sed on code

Coordinator fixes follow-up issues directly Bypasses TDD — no failing test, no builder, no review; defeats the entire workflow even for "small" fixes Route follow-ups through the full pipeline: test-writer → builder → reviewer; create a new feature doc or amend the existing one

Unbounded review → building loop Builder and reviewer cycle indefinitely, burning tokens on issues the builder cannot resolve alone Auto-loop up to 3 cycles; after 3, escalate to the user with remaining issues

Launching next agent before current one finishes Both agents edit the same feature's files simultaneously, causing conflicts and lost work Per-feature sequential: wait for each agent to complete before launching the next; cross-feature parallelism is fine with non-overlapping affected-files

Agent stays active after completing its stage Idle agent reacts to next role's file changes, causing conflicts (e.g., builder "fixes" test-writer's new tests) Exit Protocol in agent definitions: output report then STOP; TeammateIdle exits 0 to let agents die; coordinator launches fresh sessions

Reviewer fixes code directly Defeats independence — reviewer can't objectively review code it wrote; bypasses TDD pipeline Reviewer reports issues only; coordinator routes to test-writer (for test gaps) or builder (for implementation issues)

Ideation README never updated after pipeline Feature appears incomplete in ideation folder; scanning for shipped features requires reading completed/ instead of ideation metadata Coordinator updates ideation README to shipped in "After reviewer approves" step

Feature docs without numeric prefix Similarly-named features (user-auth.md vs user-auth-v2.md) cause agents to read the wrong doc from completed/ or other directories Always use scripts/next-feature-number.sh to get a unique NNN- prefix at creation time

Running verify on test-writer output (Python/Rust) Type errors on unresolved imports fire on every response; test failures block task completion Hooks detect testing stage and stack via lifecycle-stage.sh ; skip verification for Python/Rust TDD but not frontend build-first

Writing Vitest unit tests in frontend workflow Unit tests break on every component refactor; internal APIs are unstable in vibe-coded UIs Frontend test-writer writes Playwright E2E only; user-visible behavior is the stable contract

Picking up a feature with unmet dependencies Implementation builds on code that doesn't exist yet; tests reference missing APIs; entire feature may need rework Run scripts/check-deps.sh before pickup; agents and hooks check automatically

Deep dependency chains declared in a single doc Stale chain data if intermediate features change; maintenance burden grows with chain length Each doc declares only its immediate parent (depends-on: NNN-name ); the script resolves the full chain dynamically from completed/

Circular dependencies between features Pipeline deadlock — neither feature can proceed because each waits for the other check-deps.sh detects cycles and exits with error; redesign features to break the cycle

Spawning agents without TeamCreate No team lifecycle, no SendMessage, no shared task tracking — agents run in isolation Create a team first with TeamCreate , spawn agents with Agent tool, coordinate with SendMessage

Forgetting TeamDelete after pipeline Orphaned team config persists in ~/.claude/teams/ ; stale task lists accumulate Always shutdown_request all teammates then TeamDelete after the pipeline completes

Starting next feature while on a feature branch New feature branches from previous feature instead of main; creates dependency stacking where features can't be merged independently Pre-flight check in implement-feature.md verifies git rev-parse --abbrev-ref HEAD returns main ; refuse to start until on main

Skipping merge step after reviewer approval Feature branch sits unmerged; next feature branches from stale state; causes cascading dependency chain across features Coordinator creates PR with gh pr create , merges with gh pr merge --squash --delete-branch , then returns to main

agent-teams

Safety Notice

Copy this and send it to your AI assistant to learn

feature: user-auth status: in-progress # or: complete, shipped created: 2025-01-15

2025-01-15 — Initial exploration

2025-01-16 — API design

User Authentication

Summary

Acceptance Criteria

Edge Cases

Out of Scope

Technical Notes

1. Create team

2. Spawn builder

3. Wait for builder to finish (TeammateIdle notification)

4. Shut down builder

5. Spawn test-writer for E2E tests

6. Wait for test-writer to finish

7. Shut down test-writer, spawn reviewer

8. Wait for reviewer, then merge and clean up

Create PR and merge to main

Return to main

Clean up the team

1. Create team

2. Spawn test-writer (writes failing tests first)

3. Wait for test-writer to finish (TeammateIdle notification)

4. Shut down test-writer

5. Spawn builder

6. Wait for builder to finish

7. Shut down builder, spawn reviewer

8. Wait for reviewer, then merge and clean up

Create PR and merge to main

Return to main

Clean up the team

Dashboard Cards Redesign

Visual Direction

Reference

Source Transparency

Related Skills

neo4j-driver-python

neo4j-data-models

neo4j-cypher

git-workflow