Empirical Validation

Core Principle

"The code looks correct" is NOT validation.

Every change must be verified with empirical evidence before being marked complete.

Validation Methods by Change Type

Change Type Required Validation Tool

UI Changes Screenshot showing expected visual state browser_subagent

API Endpoints Command showing correct response run_command

Build/Config Successful build or test output run_command

Data Changes Query showing expected data state run_command

File Operations File listing or content verification run_command

Validation Protocol

Before Marking Any Task "Done"

Identify Verification Criteria

What should be true after this change?
How can that be observed?

Execute Verification

Run the appropriate command or action
Capture the output/evidence

Document Evidence

Add to .gsd/JOURNAL.md under the task
Include actual output, not just "passed"

Confirm Against Criteria

Does evidence match expected outcome?
If not, task is NOT complete

Examples

API Endpoint Verification

Good: Actual test showing response

curl -X POST http://localhost:3000/api/login -d '{"email":"test@test.com"}'

Output: {"success":true,"token":"..."}

Bad: Just saying "endpoint works"

UI Verification

Good: Take screenshot with browser tool

Navigate to /dashboard
Capture screenshot
Confirm: Header visible? Data loaded? Layout correct?

Bad: "The component should render correctly"

Build Verification

Good: Show build output

npm run build

Output: Successfully compiled...

Bad: "Build should work now"

Forbidden Phrases

Never use these as justification for completion:

"This should work"
"The code looks correct"
"I've made similar changes before"
"Based on my understanding"
"It follows the pattern"

Integration

This skill integrates with:

/verify — Primary workflow using this skill
/execute — Must validate before marking tasks complete
Rule 4 in GEMINI.md — Empirical Validation enforcement

Failure Handling

If verification fails:

Do NOT mark task complete
Document the failure in .gsd/STATE.md
Create fix task if cause is known
Trigger Context Health Monitor if 3+ failures

empirical validation

Safety Notice

Copy this and send it to your AI assistant to learn

Good: Actual test showing response

Output: {"success":true,"token":"..."}

Bad: Just saying "endpoint works"

Good: Take screenshot with browser tool

Bad: "The component should render correctly"

Good: Show build output

Output: Successfully compiled...

Bad: "Build should work now"

Source Transparency

Related Skills

gsd codebase mapper

gsd planner

gsd debugger

gsd verifier