multi-agent-e2e-validation

Multi-Agent E2E Validation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "multi-agent-e2e-validation" with this command: npx skills add terrylica/cc-skills/terrylica-cc-skills-multi-agent-e2e-validation

Multi-Agent E2E Validation

Overview

Prescriptive workflow for spawning parallel validation agents to comprehensively test database refactors. Successfully identified 5 critical bugs (100% system failure rate) in QuestDB migration that would have shipped in production.

When to Use This Skill

Use this skill when:

  • Database refactors (e.g., v3.x file-based → v4.x QuestDB)

  • Schema migrations requiring validation

  • Bulk data ingestion pipeline testing

  • System migrations with multiple validation layers

  • Pre-release validation for database-centric systems

Key outcomes:

  • Parallel agent execution for comprehensive coverage

  • Structured validation reporting (VALIDATION_FINDINGS.md)

  • Bug discovery with severity classification (Critical/Medium/Low)

  • Release readiness assessment

Core Methodology

  1. Validation Architecture (3-Layer Model)

Layer 1: Environment Setup

  • Container orchestration (Colima/Docker)

  • Database deployment and schema application

  • Connectivity validation (ILP, PostgreSQL, HTTP ports)

  • Configuration file creation and validation

Layer 2: Data Flow Validation

  • Bulk ingestion testing (CloudFront → QuestDB)

  • Performance benchmarking against SLOs

  • Multi-month data ingestion

  • Deduplication testing (re-ingestion scenarios)

  • Type conversion validation (FLOAT→LONG casts)

Layer 3: Query Interface Validation

  • High-level query methods (get_latest, get_range, execute_sql)

  • Edge cases (limit=1, cross-month boundaries)

  • Error handling (invalid symbols, dates, parameters)

  • Gap detection SQL compatibility

  1. Agent Orchestration Pattern

Sequential vs Parallel Execution:

Agent 1 (Environment) → [SEQUENTIAL - prerequisite] ↓ Agent 2 (Bulk Loader) → [PARALLEL with Agent 3] Agent 3 (Query Interface) → [PARALLEL with Agent 2]

Dependency Rule: Environment validation must pass before data flow/query validation

Dynamic Todo Management:

  • Start with high-level plan (ADR-defined phases)

  • Prune completed agents from todo list

  • Grow todo list when bugs discovered (e.g., Bug #5 found by Agent 3)

  • Update VALIDATION_FINDINGS.md incrementally

  1. Validation Script Structure

Each agent produces:

  • Test Script (e.g., test_bulk_loader.py )

  • 5+ test functions with clear pass/fail criteria

  • Structured output (test name, result, details)

  • Summary report at end

  • Artifacts (logs, config files, evidence)

  • Findings Report (bugs, severity, fix proposals)

Example Test Structure:

def test_feature(conn): """Test 1: Feature description""" print("=" * 80) print("TEST 1: Feature description") print("=" * 80)

results = {}

# Test 1a: Subtest name
print("\n1a. Testing subtest:")
result_1a = perform_test()
print(f"   Result: {result_1a}")
results["subtest_1a"] = result_1a == expected_1a

# Summary
print("\n" + "-" * 80)
all_passed = all(results.values())
print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}")
for test_name, passed in results.items():
    print(f"  - {test_name}: {'✓' if passed else '✗'}")

return {"success": all_passed, "details": results}

4. Bug Classification and Tracking

Severity Levels:

  • 🔴 Critical: 100% system failure (e.g., API mismatch, timestamp corruption)

  • 🟡 Medium: Degraded functionality (e.g., below SLO performance)

  • 🟢 Low: Minor issues, edge cases

Bug Report Format:

Bug N: Descriptive Name (SEVERITY - Status)

Location: file/path.py:line

Issue: One-sentence description

Impact: Quantified impact (e.g., "100% ingestion failure")

Root Cause: Technical explanation

Fix Applied: Code changes with before/after

Verification: Test results proving fix

Status: ✅ FIXED / ⚠️ PARTIAL / ❌ OPEN

  1. Release Readiness Decision Framework

Go/No-Go Criteria:

BLOCKER = Any Critical bug unfixed SHIP = All Critical bugs fixed + (Medium bugs acceptable OR fixed) DEFER = >3 Medium bugs unfixed OR any High-severity bug

Example Decision:

  • 5 Critical bugs found → all fixed ✅

  • 1 Medium bug (performance 55% below SLO) → acceptable ✅

  • Verdict: RELEASE READY

Workflow: Step-by-Step

Step 1: Create Validation Plan (ADR-Driven)

Input: ADR document (e.g., ADR-0002 QuestDB Refactor) Output: Validation plan with 3-7 agents

Plan Structure:

Validation Agents

Agent 1: Environment Setup

  • Deploy QuestDB via Docker
  • Apply schema.sql
  • Validate connectivity (ILP, PG, HTTP)
  • Create .env configuration

Agent 2: Bulk Loader Validation

  • Test CloudFront → QuestDB ingestion
  • Benchmark performance (target: >100K rows/sec)
  • Validate deduplication (re-ingestion test)
  • Multi-month ingestion test

Agent 3: Query Interface Validation

  • Test get_latest() with various limits
  • Test get_range() with date boundaries
  • Test execute_sql() with parameterized queries
  • Test detect_gaps() SQL compatibility
  • Test error handling (invalid inputs)

Step 2: Execute Agent 1 (Environment)

Directory Structure:

tmp/e2e-validation/ agent-1-env/ test_environment_setup.py questdb.log config.env schema-check.txt

Validation Checklist:

  • ✅ Container running

  • ✅ Ports accessible (9009 ILP, 8812 PG, 9000 HTTP)

  • ✅ Schema applied without errors

  • ✅ .env file created

Step 3: Execute Agents 2-3 in Parallel

Agent 2: Bulk Loader

tmp/e2e-validation/ agent-2-bulk/ test_bulk_loader.py ingestion_benchmark.txt deduplication_test.txt

Agent 3: Query Interface

tmp/e2e-validation/ agent-3-query/ test_query_interface.py gap_detection_test.txt

Execution:

Terminal 1

cd tmp/e2e-validation/agent-2-bulk uv run python test_bulk_loader.py

Terminal 2

cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py

Step 4: Document Findings in VALIDATION_FINDINGS.md

Template:

E2E Validation Findings Report

Validation ID: ADR-XXXX Branch: feat/database-refactor Date: YYYY-MM-DD Target Release: vX.Y.Z Status: [BLOCKED / READY / IN_PROGRESS]

Executive Summary

E2E validation discovered N critical bugs that would have caused [impact]:

FindingSeverityStatusImpactAgent
Bug 1CriticalFixed100% failureAgent 2

Recommendation: [RELEASE READY / BLOCKED / DEFER]

Agent 1: Environment Setup - [STATUS]

...

Agent 2: [Name] - [STATUS]

...

Step 5: Iterate on Fixes

For each bug:

  • Document in VALIDATION_FINDINGS.md with 🔴/🟡/🟢 severity

  • Apply fix to source code

  • Re-run failing test

  • Update bug status to ✅ FIXED

  • Commit with semantic message (e.g., fix: correct timestamp parsing in CSV ingestion )

Example Fix Commit:

git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py git commit -m "fix: prevent pandas from treating first CSV column as index

BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01) due to pandas read_csv() auto-indexing. Added index_col=False to preserve first column as data.

Fixes #ABC-123"

Step 6: Final Validation and Release Decision

Run all tests:

/usr/bin/env bash << 'SKILL_SCRIPT_EOF' cd tmp/e2e-validation for agent in agent-; do echo "=== Running $agent ===" cd $agent uv run python test_.py cd .. done SKILL_SCRIPT_EOF

Update VALIDATION_FINDINGS.md status:

  • Count Critical bugs: X fixed, Y open

  • Count Medium bugs: X fixed, Y open

  • Apply decision framework

  • Update Status field to ✅ RELEASE READY or ❌ BLOCKED

Real-World Example: QuestDB Refactor Validation

Context: Migrating from file-based storage (v3.x) to QuestDB (v4.0.0)

Bugs Found:

  • 🔴 Sender API mismatch - Used non-existent Sender.from_uri() instead of Sender.from_conf()

  • 🔴 Type conversion - number_of_trades sent as FLOAT, schema expects LONG

  • 🔴 Timestamp parsing - pandas treating first column as index → epoch 0 timestamps

  • 🔴 Deduplication - WAL mode doesn't provide UPSERT semantics (needed DEDUP ENABLE UPSERT KEYS )

  • 🔴 SQL incompatibility - detect_gaps() used nested window functions (QuestDB unsupported)

Impact: Without this validation, v4.0.0 would ship with 100% data corruption and 100% ingestion failure

Outcome: All 5 bugs fixed, system validated, v4.0.0 released successfully

Common Pitfalls

  1. Skipping Environment Validation

❌ Bad: Assume Docker/database is working, jump to data ingestion tests ✅ Good: Agent 1 validates environment first, catches port conflicts, schema errors early

  1. Serial Agent Execution

❌ Bad: Run Agent 2, wait for completion, then run Agent 3 ✅ Good: Run Agent 2 & 3 in parallel (no dependency between them)

  1. Manual Test Reporting

❌ Bad: Copy/paste test output into Slack/email ✅ Good: Structured VALIDATION_FINDINGS.md with severity, status, fix tracking

  1. Ignoring Medium Bugs

❌ Bad: "Performance is 55% below SLO, but we'll fix it later" ✅ Good: Document in VALIDATION_FINDINGS.md, make explicit go/no-go decision

  1. No Re-validation After Fixes

❌ Bad: Apply fix, assume it works, move on ✅ Good: Re-run failing test, update status in VALIDATION_FINDINGS.md

Resources

scripts/

Not applicable - validation scripts are project-specific (stored in tmp/e2e-validation/ )

references/

  • example_validation_findings.md

  • Complete VALIDATION_FINDINGS.md template

  • agent_test_template.py

  • Template for creating validation test scripts

  • bug_severity_classification.md

  • Detailed severity criteria and examples

assets/

Not applicable - validation artifacts are project-specific

Troubleshooting

Issue Cause Solution

Container not starting Colima/Docker not running Run colima start before Agent 1

Port conflicts Ports already in use Stop conflicting containers or use different ports

Schema application fails Invalid SQL syntax Check schema.sql for database-specific compatibility

Agent 2/3 fail without Agent 1 Environment not validated Ensure Agent 1 completes before starting Agent 2/3

Test script import errors Missing dependencies Run uv pip install in agent directory

Bug status not updating VALIDATION_FINDINGS.md stale Manually refresh status after each fix

Parallel agents interference Shared resources conflict Ensure agents use isolated directories

Decision unclear Severity mixed Critical/Medium Apply Go/No-Go criteria strictly per documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

chezmoi-workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

doppler-workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

multi-agent-performance-profiling

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

issues-workflow

No summary provided by upstream source.

Repository SourceNeeds Review