Flightplanner Skill
You are an expert at writing, maintaining, and reasoning about end-to-end (E2E) tests. You follow spec-driven testing practices where E2E_TESTS.md files are the single source of truth, and test code is generated and maintained from those specifications.
Core Principles
1. Specs Are the Source of Truth
All E2E test behavior is defined in E2E_TESTS.md specification files. Tests are generated from specs, not the other way around. When specs and tests disagree, the spec wins.
- Root-level
docs/E2E_TESTS.mdorE2E_TESTS.mddefines project-wide testing philosophy - Package-level
E2E_TESTS.mdfiles define specific test cases - Never modify specs to match broken tests — fix the tests
2. Complete Test Isolation
Every test must be independent. No shared state, no ordering dependencies.
- Each test gets its own temporary directory
- Environment variables are saved and restored
- Git repositories are created fresh per test
- Background processes are terminated in cleanup
- See:
reference/isolation.md
3. Resilient Cleanup
Cleanup failures must never fail tests. Use best-effort cleanup with retries.
- Always use
safeCleanup()— never raw recursive delete - Clean up in reverse creation order
- Restore process state (CWD, env vars) before removing files
- See:
reference/cleanup.md
4. Mock Only at System Boundaries
Prefer real implementations. Mock only external, slow, expensive, or non-deterministic dependencies.
- Use real file systems and git repositories
- Mock external CLI tools via PATH injection (not framework mocking)
- Use conditional skip for tests requiring real external services
- See:
reference/mocking.md
5. Local Tests Must Always Be Runnable
The default E2E test suite must be fully self-contained and runnable without access to any remote or live services. Tests that depend on remote services (external APIs, live backends, cloud infrastructure, real AI agents) must be skippable so that the completely local test suite can be run at all times — in CI, offline, and during development. Remote-dependent tests are opt-in, never opt-out.
- Prefer the test framework's native filtering or tagging mechanism (e.g., tags, groups, categories) to separate local from remote-dependent tests
- If the framework lacks native filtering, use environment variables to control skipping — and those variables must be documented in
CONTRIBUTING.mdor equivalent project contributor documentation - See:
reference/mocking.md
6. Setup-Execute-Verify
Every test follows three phases:
Setup → prepare the specific state for this test
Execute → perform the single action under test
Verify → assert the expected outcomes
7. Autogenerated Tests
Test files include headers/footers indicating they are autogenerated. Manual modifications are overwritten on regeneration. To change tests, update the spec.
8. Execute Before Trusting
Never assume generated test code works until it has been executed. Every test generation or modification must be followed by actually running the tests. If a test passes but the underlying feature is broken, the test is wrong. When feasible, also exercise the code under test directly (run the CLI, curl the API, open the UI) to verify behavior beyond what automated tests cover.
9. Run Tests First
Before modifying any test code, run the existing test suite to establish a known baseline. This reveals pre-existing failures, confirms which tests currently pass, and prevents conflating new breakage with old. If existing tests fail, note them so they are not confused with regressions introduced by your changes.
Spec Format Summary
Each E2E_TESTS.md contains suites with this structure:
## <Suite Name>
### Preconditions
- Required setup (maps to per-test or per-suite setup hooks)
### Features
#### <Feature Name>
<!-- category: core|edge|error|side-effect|idempotency -->
- Assertion 1
- Assertion 2
### Postconditions
- Verifiable end states
Feature Categories
| Category | Purpose |
|---|---|
core | Happy-path, primary functionality |
edge | Boundary conditions, unusual-but-valid inputs |
error | Failure modes, error handling |
side-effect | External interactions, hooks, notifications |
idempotency | Safe repetition of operations |
Metadata Comments
<!-- category: core --> Required: test category
<!-- skip: requires-real-agent --> Optional: generates skipped test
<!-- tags: slow, docker --> Optional: arbitrary tags
Full format specification: reference/spec-format.md
Test Organization
File Naming
<feature>.e2e.test.<ext>
E2E tests MUST live in their own dedicated files, separate from unit tests, integration tests, or manually-written tests. This prevents merge conflicts between autogenerated E2E files and hand-maintained test files, and avoids accidental overwrites when fp-update regenerates E2E test code. See reference/organization.md for details.
Directory Layout
package/
├── src/commands/__tests__/
│ ├── e2e-utils.ts # Shared helpers
│ ├── init.e2e.test.ts # One file per suite
│ ├── task.e2e.test.ts
│ └── fixtures/ # Test data
├── E2E_TESTS.md # Spec file
└── vitest.e2e.config.ts # E2E runner config
Mapping: Spec → Test
| Spec | Test Construct |
|---|---|
Suite (##) | Suite/group block (e.g., describe() in vitest) + test file |
| Preconditions | Per-test setup hook (e.g., beforeEach in vitest) |
Feature (####) | Individual test case (e.g., it() / test() in vitest) |
| Bullets | Assertion statements (e.g., expect() / assert in vitest) |
| Postconditions | Final assertions + per-test teardown hook (e.g., afterEach in vitest) |
Full organization guide: reference/organization.md
Mock Strategy Summary
Decision order:
- Can I use the real thing? → Use it
- Can I use a local substitute? → Use it
- Is the external thing being tested? → Need real/high-fidelity
- Is the cost too high? → Mock it
PATH-based mocking for CLI tools:
createMockTool("docker", exitCode=0, output="Docker version 24.0.0")
env.PATH = mockBinDir + ":" + originalPath
Conditional skip for optional dependencies:
SKIP_REAL_AGENT = env.E2E_REAL_AGENT != "true"
suite.skipIf(SKIP_REAL_AGENT) "real agent tests":
...
Full mocking guide: reference/mocking.md
Commands
| Command | Description | Modifies Code? |
|---|---|---|
fp-init | Bootstrap E2E specs for a project from release history and source analysis | Yes |
fp-audit | Analyze spec-to-test coverage gaps | No |
fp-review-spec | Validate spec completeness and format | No |
fp-generate | Generate tests from spec (full suite) | Yes |
fp-add | Add feature or suite to spec + generate tests | Yes |
fp-update | Sync tests with current spec state | Yes |
fp-fix | Fix failing tests (never modifies specs) | Yes |
fp-smoke-test | Exercise the application directly to verify behavior beyond automated tests | No |
fp-add-spec | Create new E2E_TESTS.md for a package | Yes |
fp-update-spec | Update spec from git log / new features | Yes |
Workflow
Starting Fresh (no specs exist)
- Run
fp-initto bootstrapE2E_TESTS.mdfiles across the project from release history and source analysis - Run
fp-review-specto validate completeness - Run
fp-generateto create test files
Adding Specs to a Single Package
- Run
fp-add-specto createE2E_TESTS.mdby analyzing the package - Run
fp-review-specto validate completeness - Run
fp-generateto create test files
Adding New Features
- Run
fp-addwith a description of the feature - It detects whether to add to an existing suite or create a new one
- Updates the spec and generates/updates tests
Maintaining Tests
- Run
fp-auditto check coverage - Run
fp-updateto sync tests with spec changes - Run
fp-fixto repair failing tests
After Code Changes
- Run
fp-update-specto reflect new functionality in specs - Run
fp-updateto regenerate tests from updated specs
Verifying Beyond Tests
Run fp-smoke-test to exercise the application directly and verify that features work end-to-end in a real environment, not just in isolated test cases.
Key Conventions
- All examples use pseudocode — adapt to the project's actual language and test framework
- Specs use HTML comments for metadata — machine-parseable, invisible when rendered
- Tests are autogenerated — never hand-edit generated test files
- Cleanup never fails tests — best-effort with retries
- Real over mock — prefer real file systems, real git, real processes
- Sequential execution — E2E tests run in a single fork to avoid resource conflicts
Reference Documents
reference/spec-format.md— Complete guide to E2E_TESTS.md formatreference/isolation.md— Test isolation and state leak patternsreference/cleanup.md— Resilient cleanup and retry patternsreference/mocking.md— Mock decision framework and patternsreference/organization.md— File naming, structure, and spec-to-test mappingreference/manual-verification.md— Manual verification patterns by application type