testing-expected-results

Verify command behavior and side effects against expected outcomes, not just exit codes - catch the "exit 0 but actually broken" cases

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "testing-expected-results" with this command: npx skills add wojons/skills/wojons-skills-testing-expected-results

Testing Expected Results

Run real commands and verify they produce the ACTUAL side effects and outputs you expect - not just "exit code 0." Catches the dangerous cases where commands "succeed" but don't do what they claim.

When to use me

Use this skill when:

  • A command returns 0 but you're not sure it actually worked
  • You need to verify side effects (files created, data changed, services running)
  • Exit code checking gives false confidence
  • "It ran without error" isn't enough proof
  • Commands have complex side effects across multiple systems
  • You're debugging "why did the deploy succeed but the app is down?"

What I do

1. Capture Pre-State

Before running the command, capture:

  • Filesystem state (files, directories, permissions)
  • Database state (records, schema)
  • Process state (running services)
  • Network state (ports, connections)
  • Environment variables

2. Run the Command

Execute the actual command with:

  • Timeout protection
  • Resource limits
  • Security sandboxing
  • Output capture (stdout, stderr)
  • Exit code capture

3. Capture Post-State

After the command completes, capture the same state.

4. Smart Comparison

Compare actual vs expected with intelligence:

  • Exact match - For deterministic output
  • Pattern match - For variable content (timestamps, UUIDs)
  • Range match - For numeric values (response time, file size)
  • Structure match - For JSON/XML (ignore key order)
  • Semantic match - For content meaning (not just bytes)
  • Existence check - For "should exist" / "should not exist"
  • Delta check - For "should have changed by X"

5. Side Effect Verification

Verify specific side effects:

  • Filesystem - File created/modified/deleted, permissions changed
  • Database - Records inserted/updated, schema migrated
  • Processes - Service started/stopped/restarted
  • Network - Port bound, connection made, API called
  • External - Cloud resources created, messages queued

6. Async/Delayed Effect Handling

For commands with eventual consistency:

  • Poll with configurable intervals
  • Wait for specific conditions
  • Timeout handling
  • Retry logic

Examples

# Verify a backup actually created a valid backup
bash scripts/verify.sh \
  --command "./backup.sh --source=/data --dest=/backups" \
  --expected 'file_exists:/backups/backup-$(date +%Y%m%d).tar.gz' \
  --expected 'file_size:>100MB' \
  --expected 'file_integrity:sha256' \
  --negative 'file_modified:/data' \
  --timeout 300

# Verify a deployment actually started the service
bash scripts/verify.sh \
  --command "./deploy.sh --version=v2.0.0" \
  --expected 'process_running:my-service' \
  --expected 'port_listening:8080' \
  --expected 'http_healthy:http://localhost:8080/health' \
  --poll-interval 5 --timeout 120

# Verify a database migration actually changed the schema
bash scripts/verify.sh \
  --command "./migrate.sh up" \
  --expected 'db_table_exists:new_table' \
  --expected 'db_column_exists:new_table.new_column' \
  --expected 'db_constraint:unique_on_email' \
  --db-connection "postgresql://localhost/mydb"

# Verify an export actually produced correct data
bash scripts/verify.sh \
  --command "./export.sh --format=csv --output=/exports/users.csv" \
  --expected 'file_exists:/exports/users.csv' \
  --expected 'file_contains:"user_id,email,name"' \
  --expected 'line_count:>1000' \
  --expected 'csv_valid:yes' \
  --negative 'file_contains:ERROR'

# Verify negative side effects (what shouldn't happen)
bash scripts/verify.sh \
  --command "./cleanup.sh --days=30" \
  --expected 'file_deleted:/tmp/old_stuff' \
  --negative 'file_exists:/important/data' \
  --negative 'file_deleted:/critical/config'

Verification Types

Filesystem Effects

file_exists:
  path: /path/to/file
  optional:
    - min_size: 100MB       # File must be at least this big
    - max_size: 1GB        # File must be at most this big
    - permissions: 644     # Specific permissions
    - owner: appuser       # Specific owner
    - modified_after: now # Modified after command started
    - content_type: text  # MIME type or magic number

file_contains:
  path: /path/to/file
  pattern: "string or regex"
  optional:
    - count: 1             # Must appear exactly N times
    - line_number: 5       # Must be on specific line

file_hash:
  path: /path/to/file
  algorithm: sha256        # sha256, md5, sha512
  expected: abc123...      # Hash value (optional - just check hash exists)

directory_structure:
  path: /path/to/dir
  expected: |
    dir/
    dir/file1.txt
    dir/subdir/
    dir/subdir/file2.txt

Database Effects

db_table_exists:
  name: users
  connection: ${DB_URL}

db_column_exists:
  table: users
  column: email
  type: varchar(255)
  nullable: false

db_row_count:
  table: users
  where: "created_at > NOW() - INTERVAL '1 day'"
  expected: 100
  tolerance: +/- 10        # Allow 90-110

db_query_result:
  query: "SELECT COUNT(*) FROM users WHERE active = true"
  expected: "> 1000"

Process Effects

process_running:
  name: my-service
  optional:
    - user: appuser
    - cpu_percent: < 50
    - memory_mb: < 1024
    - uptime_seconds: > 60

port_listening:
  port: 8080
  protocol: tcp           # tcp, udp
  optional:
    - interface: 0.0.0.0   # Specific bind address
    - process_name: app   # Must be owned by this process

Network Effects

http_request:
  url: http://localhost:8080/health
  method: GET
  expected_status: 200
  optional:
    - timeout: 5
    - expected_body: '{"status": "healthy"}'
    - expected_headers: 'Content-Type: application/json'
    - retry: 3

tcp_connect:
  host: localhost
  port: 5432
  timeout: 5

Content Verification

csv_valid:
  file: /path/to/file.csv
  expected_columns: id,name,email
  row_count: "> 100"

json_valid:
  file: /path/to/file.json
  schema: /path/to/schema.json  # JSON Schema validation
  required_paths:
    - $.status
    - $.data.users[0].name

Comparison Strategies

Handling Non-Determinism

Timestamps:

# Match any ISO8601 timestamp
--expected 'file_contains:{{TIMESTAMP}}'

# Match timestamp within range
--expected 'file_contains:{{TIMESTAMP_RANGE:2024-01-01,2024-12-31}}'

UUIDs:

# Match any UUID
--expected 'file_contains:{{UUID}}'

# Match UUID pattern but validate it
--expected 'file_contains:{{UUID_FORMAT}}'

Order-Independent:

# For JSON arrays, sets, etc.
--expected 'json_path:$.data.items contains [1,2,3] (any order)'

Partial Matching

# File must contain ALL these patterns
--expected 'file_contains_all:["success", "completed", "exit 0"]'

# File must contain AT LEAST ONE of these
--expected 'file_contains_any:["success", "done", "finished"]'

# File must contain pattern EXACTLY N times
--expected 'file_contains:"ERROR" count:0'  # No errors

Security

Sandboxing:

# Run in container
bash scripts/verify.sh --sandbox container ...

# Run with limited permissions
bash scripts/verify.sh --sandbox chroot --chroot-dir /tmp/sandbox ...

# Resource limits
bash scripts/verify.sh --max-memory 1GB --max-cpu 50% --timeout 300 ...

Secret Masking:

# Automatically mask common secret patterns in output
bash scripts/verify.sh --mask-secrets ...

Output Format

Verification Report
===================
Command: ./backup.sh --source=/data --dest=/backups
Exit Code: 0
Duration: 45.2s

Pre-State Captured:
  Files: 1,247
  Database tables: 23
  Processes: 12

Post-State Captured:
  Files: 1,248 (+1)
  Database tables: 23 (unchanged)
  Processes: 12 (unchanged)

Expected Results Verification:
  ✅ file_exists:/backups/backup-20240308.tar.gz
     - Path exists: yes
     - Size: 1.2GB (expected: >100MB) ✅
     - Permissions: 644 ✅
     - Created: 2024-03-08T10:30:15Z (after command start) ✅
     - Hash (sha256): abc123... ✅

  ❌ file_integrity (custom check)
     - Can extract archive: yes
     - Can restore from backup: FAILED
     - Error: "table users has wrong schema version"
     
Negative Assertions:
  ✅ file_modified:/data - No changes detected
  ✅ file_deleted:/important - No deletions detected

Async Effects:
  ✅ service_health (after 30s polling)
     - Service responsive: yes
     - Health check passed: yes

Result: FAILED

Discrepancy Analysis:
  The backup file was created with correct size and permissions,
  but integrity check reveals it cannot be restored. The schema
  version mismatch suggests the backup captured incompatible data.

Recommendations:
  1. Run schema migration before backup
  2. Add schema version check to backup script
  3. Include test restore in backup verification

Commands Run:
  Pre-state capture: 0.5s
  Command execution: 42.1s
  Post-state capture: 0.4s
  Verification: 2.2s
  Total: 45.2s

Limitations

What we CAN'T verify:

  • In-memory state changes (caches, variables)
  • Browser/client-side state
  • Side effects in systems we can't access
  • Changes that happen after verification timeout
  • Non-deterministic behavior that changes between runs
  • Effects in distributed systems with eventual consistency (we can poll but may miss)

Known Issues:

  • Time-of-check-time-of-use (TOCTOU) race conditions
  • State changes between capture and verification
  • Verification is only as good as the expected results definition

Trust But Verify

This skill implements "trust but verify" - we trust the command ran, but we verify it did what it claimed. Always remember:

  • Exit code 0 doesn't mean success
  • Success doesn't mean correctness
  • Correctness doesn't mean safety
  • Safety doesn't mean completeness

Use multiple verification layers for critical operations.

Notes

  • Verification adds overhead. Use selectively for critical commands.
  • Define expected results exhaustively - partial verification gives false confidence.
  • Include negative assertions (what shouldn't happen) alongside positive ones.
  • For long-running commands, use async verification with polling.
  • Always include timeout to prevent hanging on failed commands.
  • Use semantic matchers (structure, pattern) over exact string comparison when possible.
  • Document in your expected results WHY each check matters - future you will thank you.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

adversarial-thinking

No summary provided by upstream source.

Repository SourceNeeds Review
General

redteam

No summary provided by upstream source.

Repository SourceNeeds Review
General

performance-profiling

No summary provided by upstream source.

Repository SourceNeeds Review
Research

test-gap-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
testing-expected-results | V50.AI