Empirical Validation
Core Principle
"The code looks correct" is NOT validation.
Every change must be verified with empirical evidence before being marked complete.
Validation Methods by Change Type
Change Type Required Validation Tool
UI Changes Screenshot showing expected visual state browser_subagent
API Endpoints Command showing correct response run_command
Build/Config Successful build or test output run_command
Data Changes Query showing expected data state run_command
File Operations File listing or content verification run_command
Validation Protocol
Before Marking Any Task "Done"
Identify Verification Criteria
-
What should be true after this change?
-
How can that be observed?
Execute Verification
-
Run the appropriate command or action
-
Capture the output/evidence
Document Evidence
-
Add to .gsd/JOURNAL.md under the task
-
Include actual output, not just "passed"
Confirm Against Criteria
-
Does evidence match expected outcome?
-
If not, task is NOT complete
Examples
API Endpoint Verification
Good: Actual test showing response
curl -X POST http://localhost:3000/api/login -d '{"email":"test@test.com"}'
Output: {"success":true,"token":"..."}
Bad: Just saying "endpoint works"
UI Verification
Good: Take screenshot with browser tool
- Navigate to /dashboard
- Capture screenshot
- Confirm: Header visible? Data loaded? Layout correct?
Bad: "The component should render correctly"
Build Verification
Good: Show build output
npm run build
Output: Successfully compiled...
Bad: "Build should work now"
Forbidden Phrases
Never use these as justification for completion:
-
"This should work"
-
"The code looks correct"
-
"I've made similar changes before"
-
"Based on my understanding"
-
"It follows the pattern"
Integration
This skill integrates with:
-
/verify — Primary workflow using this skill
-
/execute — Must validate before marking tasks complete
-
Rule 4 in GEMINI.md — Empirical Validation enforcement
Failure Handling
If verification fails:
-
Do NOT mark task complete
-
Document the failure in .gsd/STATE.md
-
Create fix task if cause is known
-
Trigger Context Health Monitor if 3+ failures