Testing Patterns

A pragmatic approach to testing that emphasises:

Live testing over mocks
Agent execution to preserve context
YAML specs as documentation and tests
Persistent results committed to git

Philosophy

This is not traditional TDD. Instead:

Test in production/staging with good logging
Use agents to run tests (keeps main context clean)
Define tests declaratively in YAML (human-readable, version-controlled)
Focus on integration (real servers, real data)

Why Agent-Based Testing?

Running 50 tests in the main conversation would consume your entire context window. By delegating to a sub-agent:

Main context stays clean for development
Agent can run many tests without context pressure
Results come back as a summary
Failed tests get detailed investigation

Commands

Command Purpose

/create-tests

Discover project, generate test specs + testing agent

/run-tests

Execute tests via agent(s), report results

/coverage

Generate coverage report and identify uncovered code paths

Quick workflow:

/create-tests → Generates tests/specs/*.yaml + .claude/agents/test-runner.md /run-tests → Spawns agent, runs all tests, saves results /run-tests api → Run only specs matching "api" /run-tests --failed → Re-run only failed tests /coverage → Run tests with coverage, analyse gaps /coverage --threshold 80 → Fail if below 80%

Getting Started in a New Project

This skill provides the pattern and format. Claude designs the actual tests based on your project context.

What happens when you ask "Create tests for this project":

Discovery - Claude examines the project:

What MCP servers are configured?
What APIs or tools exist?
What does the code do?

Test Design - Claude creates project-specific tests:

Test cases for the actual tools/endpoints
Expected values based on real behavior
Edge cases relevant to this domain

Structure - Using patterns from this skill:

YAML specs in tests/ directory
Optional testing agent in .claude/agents/
Results saved to tests/results/

Example:

You: "Create tests for this MCP server"

Claude: [Discovers this is a Google Calendar MCP] [Sees tools: calendar_events, calendar_create, calendar_delete] [Designs test cases:]

    tests/calendar-events.yaml:
    - list_upcoming_events (expect: array, count_gte 0)
    - search_by_keyword (expect: contains search term)
    - invalid_date_range (expect: error status)

    tests/calendar-mutations.yaml:
    - create_event (expect: success, returns event_id)
    - delete_nonexistent (expect: error, contains "not found")

The skill teaches Claude:

How to structure YAML test specs
What validation rules are available
How to create testing agents
When to use parallel execution

Your project provides:

What to actually test
Expected values and behaviors
Domain-specific edge cases

YAML Test Spec Format

Optional: defaults applied to all tests

defaults: tool: my_tool_name timeout: 5000

tests:

name: test_case_name description: Human-readable purpose tool: tool_name # Override default if needed params: action: search query: "test input" expect: contains: "expected substring" not_contains: "should not appear" status: success

Validation Rules

Rule Description Example

contains

Response contains string contains: "from:john"

not_contains

Response doesn't contain not_contains: "error"

matches

Regex pattern match matches: "after:\d{4}"

json_path

Check value at JSON path json_path: "$.results[0].name"

equals

Exact value match equals: "success"

status

Check success/error status: success

count_gte

Array length >= N count_gte: 1

count_eq

Array length == N count_eq: 5

type

Value type check type: array

See references/validation-rules.md for complete documentation.

Creating a Testing Agent

Testing agents inherit MCP tools from the session. Create an agent that:

Reads YAML test specs
Executes tool calls with params
Validates responses against expectations
Reports results

Agent Template

CRITICAL: Do NOT specify a tools field if you need MCP access. When you specify ANY tools, it becomes an allowlist and "*" is interpreted literally (not as a wildcard). Omit tools entirely to inherit ALL tools from the parent session.

name: my-tester description: | Tests [domain] functionality. Reads YAML test specs and validates responses. Use when: testing after changes, running regression tests.

tools field OMITTED - inherits ALL tools from parent (including MCP)

model: sonnet

[Domain] Tester

How It Works

Find test specs: tests/*.yaml
Parse and execute each test
Validate responses
Report pass/fail summary

Test Spec Location

tests/ ├── feature-a.yaml ├── feature-b.yaml └── results/ └── YYYY-MM-DD-HHMMSS.md

Execution

For each test:

Call tool with params
Capture response
Apply validation rules
Record PASS/FAIL

Reporting

Save results to tests/results/YYYY-MM-DD-HHMMSS.md

See templates/test-agent.md for complete template.

Results Format

Test results are saved as markdown for git history:

Test Results: feature-name

Date: 2026-02-02 14:30 Commit: abc1234 Summary: 8/9 passed (89%)

Results

test_basic_search - PASSED (0.3s)
test_with_filter - PASSED (0.4s)
test_edge_case - FAILED

Failed Test Details

test_edge_case

Expected: Contains "expected value"
Actual: Response was empty
Params: { action: search, query: "" }

Save to: tests/results/YYYY-MM-DD-HHMMSS.md

Workflow

Create Test Specs

tests/search.yaml

tests:

name: basic_search params: { query: "hello" } expect: { status: success, count_gte: 0 }
name: filtered_search params: { query: "hello", filter: "recent" } expect: { contains: "results" }

Create Testing Agent

Copy templates/test-agent.md and customise for your domain.

Run Tests

"Run the search tests" "Test the API after my changes" "Run regression tests for gmail-mcp"

Review Results

Results saved to tests/results/ . Commit them for history:

git add tests/results/ git commit -m "Test results: 8/9 passed"

Parallel Test Execution

Run multiple test agents simultaneously to speed up large test suites:

"Run these test suites in parallel:

Agent 1: tests/auth/*.yaml
Agent 2: tests/search/*.yaml
Agent 3: tests/api/*.yaml"

Each agent:

Has its own context (won't bloat main conversation)
Can run 10-50 tests independently
Returns a summary when done
Inherits MCP tools from parent session

Why parallel agents?

50 tests in main context = context exhaustion
50 tests across 5 agents = clean context + faster execution
Each agent reports pass/fail summary, not every test detail

Batching strategy:

Group tests by feature area or MCP server
10-20 tests per agent is ideal
Too few = overhead of spawning not worth it
Too many = agent context fills up

MCP Testing

For MCP servers, the testing agent inherits configured MCPs:

Configure MCP first

claude mcp add --transport http gmail https://gmail.mcp.example.com/mcp

Then test

"Run tests for gmail MCP"

Example MCP test spec:

tests:

name: search_from_person params: { action: search, searchQuery: "from John" } expect: { contains: "from:john" }
name: search_with_date params: { action: search, searchQuery: "emails from January 2026" } expect: { matches: "after:2026" }

API Testing

For REST APIs, use Bash tool:

tests:

name: health_check command: curl -s https://api.example.com/health expect: { contains: "ok" }
name: get_user command: curl -s https://api.example.com/users/1 expect: json_path: "$.name" type: string

Browser Testing

For browser automation, use Playwright tools:

tests:

name: login_page_loads steps:
- navigate: https://app.example.com/login
- snapshot: true expect: { contains: "Sign In" }
name: form_submission steps:
- navigate: https://app.example.com/form
- type: { ref: "#email", text: "test@example.com" }
- click: { ref: "button[type=submit]" } expect: { contains: "Success" }

Tips

Start with smoke tests: Basic connectivity and auth
Test edge cases: Empty results, errors, special characters
Use descriptive names: search_with_date_filter not test1
Group related tests: One file per feature area
Add after bugs: Every fixed bug gets a regression test
Commit results: Create history of test runs

What This Is NOT

Not a Jest/Vitest replacement (use those for unit tests)
Not enforcing TDD (use what works for you)
Not a test runner library (the agent IS the runner)
Not about mocking (we test real systems)

When to Use

Scenario Use This Use Traditional Testing

MCP server validation Yes No

API integration Yes Complement with unit tests

Browser workflows Yes Complement with component tests

Unit testing No Yes (Jest/Vitest)

Component testing No Yes (Testing Library)

Type checking No Yes (TypeScript)

Related Resources

templates/test-spec.yaml
Generic test spec template
templates/test-agent.md
Testing agent template
references/validation-rules.md
Complete validation rule reference

testing-patterns

Safety Notice

Copy this and send it to your AI assistant to learn

Optional: defaults applied to all tests

tools field OMITTED - inherits ALL tools from parent (including MCP)

model: sonnet

[Domain] Tester

How It Works

Test Spec Location

Execution

Reporting

Test Results: feature-name

Results

Failed Test Details

test_edge_case

tests/search.yaml

Configure MCP first

Then test

Source Transparency

Related Skills

tailwind-v4-shadcn

tanstack-query

fastapi