Test-Driven Development

Enforces test-driven development using HelpMeTest end-to-end tests as guardrails.

Core Philosophy

Tests come first. ALL tests come first. Not one test - ALL tests covering happy paths, edge cases, and errors. Tests define what "done" means. Code exists to make tests pass. No feature is complete until all tests are green.

When to Use This Skill

Use for ANY code change that affects user-facing behavior:

New features: Write ALL tests defining expected behavior, then implement
Refactoring: Write tests capturing current behavior first, refactor, verify no regression
Bug fixes: Write test reproducing the bug, fix it, verify test passes
Enhancements: Write ALL tests for new behavior, implement, verify

Workflow

Step 1: Understand What Needs to Change

Ask the user:

What's changing? (new feature / refactoring / bug fix / enhancement)
What's the expected behavior? (what should work after the change)
What URL/port is affected? (localhost:3000? production URL?)
What hostname does the app use? (localhost? myapp.local? important for proxy)

Step 2: Propose Test-First Approach

Explain to the user:

I'll use Test-Driven Development with HelpMeTest:

1. First: Write ALL comprehensive end-to-end tests that define what "done" looks like
   - Happy path scenarios (feature works as expected)
   - Edge cases (boundary conditions, unusual inputs)
   - Error scenarios (how it should handle failures)
   - NOT just 1-2 tests - ALL tests for complete coverage

2. Create Feature artifact linking all test scenarios

3. These tests will FAIL initially - that's expected and good!
   - Failing tests show what needs to be implemented
   - They're your specification in executable form

4. Then: Implement code incrementally to make tests pass
   - Make small changes
   - Run tests after each logical step
   - Use test failures to guide next steps

5. Done when: All tests pass + no new edge cases discovered

Do you want to proceed with test-first approach?

If user says no: Ask why, but gently encourage - "Tests written first catch issues earlier and serve as living documentation. Can I at least write a few core tests to guide implementation?"

If user agrees: Proceed to Step 3.

Step 3: Set Up Local Development (if testing localhost)

CRITICAL: If testing local development (localhost), set up proxy FIRST before writing tests.

Check if User Has Local Server Running

Ask: "Do you have a local server running? If so, what port?"

If no server exists yet, help them set one up:

# Simple HTTP server
cd /path/to/app
python3 -m http.server 3000
# Or node http-server, or their framework's dev server

Start HelpMeTest Proxy

Use the helpmetest-proxy skill to set up tunnels:

The proxy skill will guide you through:

Choosing the right strategy (single tunnel, separate tunnels, or production substitution)
Setting up the tunnel command
Verifying it works with curl

Quick reference:

# Most common: single tunnel
helpmetest proxy start localhost:3000

# Verify before proceeding (use HelpMeTest, not curl)
helpmetest_run_interactive_command({ command: "Go To  http://localhost" })
# Should load your local app

Only write tests after proxy is confirmed working.

For detailed proxy setup, troubleshooting, and all strategies, use the helpmetest-proxy skill.

Step 4: Write ALL Comprehensive Tests First

CRITICAL: Write ALL tests before ANY implementation. Not 1 test. ALL tests.

Create Feature Artifact FIRST

Before writing tests, create the Feature artifact to organize scenarios:

Call how_to({ type: "context_discovery" }) to check for existing artifacts.

Create Feature artifact:

{
  "id": "feature-[name]",
  "type": "Feature",
  "name": "Feature: [Name]",
  "content": {
    "goal": "What this feature accomplishes",
    "non_goals": ["What this is NOT"],
    "status": "untested",
    "functional": [
      {
        "name": "Scenario name",
        "given": "Precondition",
        "when": "Action",
        "then": "Expected outcome",
        "tags": ["priority:critical"],
        "test_ids": []  // Will populate as we create tests
      }
    ],
    "edge_cases": [],
    "non_functional": [],
    "bugs": [],
    "persona_ids": []
  }
}

Write Tests for New Features

Write tests covering:

Happy Path Scenarios (3-5 tests):
- User can successfully complete the main workflow
- Data persists correctly
- UI shows correct feedback
- Navigation works as expected
Edge Cases (3-5 tests):
- Empty inputs
- Invalid formats
- Boundary values (too long, too short, negative numbers)
- Duplicate entries
- Wrong permissions/unauthorized access
Error Scenarios (2-3 tests):
- API failures
- Network timeouts
- Missing required data
- Concurrent operations

Use helpmetest_upsert_test for EVERY test:

Robot Framework syntax patterns:

# Navigate to page
Go To  http://localhost  timeout=5000

# Get text from element
${text}=  Get Text  [data-testid="some-element"]

# Get attribute value
${value}=  Get Property  input[name="field"]  value

# Click element
Click  [data-testid="submit-button"]

# Fill input field
Fill Text  input[name="username"]  testuser

# Wait for API response
Wait For Response  url=**/api/endpoint**  status=200  timeout=5000

# Wait for element state
Wait For Elements State  [data-testid="result"]  visible  timeout=5000

# Assertions
Should Be Equal  ${actual}  ${expected}
Should Be Equal As Numbers  ${num1}  ${num2}
Should Not Be Empty  ${text}

# Type conversion
${number}=  Convert To Integer  ${text}

# Reload page
Reload

# Wait/pause
Sleep  0.5s

Common test structure:

# Given - Set up initial state
Go To  http://localhost  timeout=5000
${initial_state}=  Get Text  [data-testid="element"]

# When - Perform the action being tested
Click  [data-testid="action-button"]
Wait For Response  url=**/api/action**  status=200

# Then - Verify the outcome
${result}=  Get Text  [data-testid="result"]
Should Be Equal  ${result}  expected_value

# Optional - Verify persistence
Reload
${persisted}=  Get Text  [data-testid="element"]
Should Be Equal  ${persisted}  ${result}

Test naming: Descriptive names like "User can submit form and data persists after reload"

Link tests to Feature artifact: Add each test ID to the corresponding scenario's test_ids array.

For Refactoring

CRITICAL: Tests must capture EXISTING behavior BEFORE refactoring.

Navigate to the code being refactored
Understand current behavior - what does it do now?
Write tests that verify current behavior works
Run tests to confirm they pass with current code
Then refactor
Run tests again to verify no regression

For Bug Fixes

Write a test that REPRODUCES the bug:

# Bug reproduction: Steps that trigger the specific bug
Go To  http://localhost  timeout=5000

# Set up the conditions that cause the bug
${initial_state}=  Get Text  [data-testid="element"]

# Perform the action that triggers the bug
Click  [data-testid="trigger-button"]

# Expected behavior: what SHOULD happen
# Actual behavior (before fix): what currently happens (test will FAIL)
${actual}=  Get Text  [data-testid="result"]
Should Be Equal  ${actual}  expected_correct_value

The test will FAIL before the bug is fixed, then PASS after.

Step 5: Run All Tests (Expect Failures)

After writing ALL tests, run them using helpmetest_run_test:

helpmetest_run_test({ id: "test-id" })

Explain failures to user:

✅ Test Suite Created: [X] tests written

Expected Results (tests SHOULD fail now):
- [Test 1]: ❌ Will fail - feature not implemented yet
- [Test 2]: ❌ Will fail - API endpoint doesn't exist
- [Test 3]: ❌ Will fail - UI elements not created
...

This is GOOD! Failing tests show exactly what needs to be built.
They're your roadmap for implementation.

Ready to start implementing?

Step 6: Implement Incrementally

Process:

Pick the highest priority failing test (priority:critical first)
Explain what needs to be implemented to make it pass
Make the minimal change to progress toward green

Check syntax (if code language):

# JavaScript
node -c app.js

# Python
python -m py_compile script.py

# Others: use language's syntax checker

Run that specific test
If test passes: Move to next failing test
If test still fails: Analyze failure, implement more, check syntax, re-run

When to run tests:

✅ After implementing a complete function/component
✅ After fixing a syntax error
✅ After adding a feature piece (e.g., API endpoint, UI handler)
❌ NOT after every single line (too frequent)
❌ NOT only at the end (defeats TDD purpose)

Balance: Run tests when you've completed something meaningful toward making a test pass.

Step 7: Handle Test Failures During Implementation

When a test fails during implementation:

1. Check for syntax errors FIRST:

# Run syntax checker for your language
node -c file.js  # JavaScript
python -m py_compile file.py  # Python
# etc.

If syntax error: fix it, re-run test.

2. If no syntax error, read the test failure:

What step failed?
What was expected vs actual?
Any error messages?

3. Determine cause:

Implementation incomplete? → Continue implementing
Implementation bug? → Fix the code
Test wrong? → Review test (rare - tests were written based on requirements)

4. Debug interactively if unclear:

# Run test steps manually to see what's happening
Go To  http://localhost  timeout=5000

# See what's actually on the page
Get Elements  button  # All buttons
Get Text  .result  # What result is shown?

# Try the action
Click  button >> "Submit"
# Did it work? Check the response

5. Fix and re-run:

Make the fix
Check syntax again
Run the failing test
If passes, continue to next test
If still fails, repeat debugging

Step 8: All Tests Green - But Are We Done?

When all tests pass, don't claim done yet. Review for missing edge cases:

Ask yourself:

What happens if user does X twice?
What if API is slow/times out?
What if user has no permission?
What about concurrent users?
What about large data sets?

If you find gaps:

✅ All [X] tests passing!

But I found potential edge cases not covered:
1. What if user adds 1000 items? (performance)
2. What if two users edit same item? (race condition)
3. What if API is down? (error handling)

Should I write tests for these scenarios?

If user agrees: Write additional tests, they'll likely fail, implement fixes, re-run.

If no gaps found:

✅ All [X] tests passing!
✅ No obvious edge cases missing
✅ Feature appears complete

Ready to mark as done?

Step 9: Final Validation

Run ALL tests one more time to ensure nothing broke:

# Run all tests for this feature
helpmetest_run_test({ tags: ["feature:[feature-name]"] })

If any test fails:

Identify which test broke and why
Determine if it's a regression
Fix the code or update the test (with user approval)

Step 10: Mark as Complete

Only when:

✅ All new tests pass
✅ No existing tests broken (or intentionally updated)
✅ No obvious edge cases missing
✅ User confirms implementation matches requirements
✅ Feature artifact updated with status: "working"

Summary to user:

✅ Implementation Complete!

Tests Written: [X] total
- [Y] happy path scenarios
- [Z] edge cases
- [W] error scenarios

All tests passing ✅

Changes made:
- [list files changed]
- [list key functionality added]

Test coverage ensures:
- Feature works as expected
- Edge cases handled
- No regressions introduced

Best Practices

Writing Good Tests

Tests should verify business outcomes, not implementation details:

❌ Bad test:

# Just checks element exists - doesn't verify functionality
Wait For Elements State  [data-testid="submit-button"]  visible

✅ Good test:

# Verifies the business outcome - data persists after submission
Fill Text  input[name="field"]  test_value
Click  [data-testid="submit-button"]
Wait For Response  url=**/api/save**  status=200

Reload
Wait For Elements State  input[name="field"]  visible
${value}=  Get Property  input[name="field"]  value
Should Be Equal  ${value}  test_value

Syntax Checking

Always check syntax before running tests if you're modifying code:

# JavaScript
node -c file.js && echo "✅ Syntax OK"

# Python
python -m py_compile file.py && echo "✅ Syntax OK"

# TypeScript
tsc --noEmit && echo "✅ Syntax OK"

Syntax errors waste time - catch them before running tests.

Proxy Troubleshooting

If tests can't reach your local server, use the /helpmetest-proxy skill for complete troubleshooting:

Verifying proxy is running
Checking local server connectivity
Fixing common connection issues
Choosing the right proxy strategy (including setups where a frontend dev server already proxies the backend internally, e.g. Vite's server.proxy config)

Tag Schema

Use these tags for TDD tests:

tdd:new-feature - Test for new functionality
tdd:refactoring-guard - Test capturing existing behavior before refactoring
tdd:bug-fix - Test reproducing a bug (should fail initially, pass after fix)
tdd:edge-case - Additional edge case discovered after initial implementation

Also include standard tags:

priority:critical|high|medium|low
feature:[feature-name]
bug:[bug-id] (if applicable)

Critical Rules

Write ALL tests first - Not 1 test. ALL tests (happy path, edge cases, errors) before ANY implementation
Create Feature artifact early - Before writing tests, create artifact to organize scenarios
Set up proxy for localhost - Must be running before tests execute
Check syntax frequently - Before running tests, after code changes
Expect failures - Initial test failures are good, they define the work
Run tests after logical steps - Not every line, not only at end
Green isn't done - All tests passing means review for missing edge cases
Tests are guardrails - If test fails after change, either fix code or (rarely) fix test
No feature complete until all tests pass + no gaps found

Example: Complete TDD Flow

See references/examples/wishlist-example.md for a complete walkthrough of adding a wishlist feature using TDD — covering proxy setup, test creation, incremental implementation, and edge case discovery.

Version: 0.1

tdd

Safety Notice

Copy this and send it to your AI assistant to learn