QA & Testing Engine

# QA & Testing Engine — Complete Software Quality System

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "QA & Testing Engine" with this command: npx skills add 1kalin/afrexai-qa-testing-engine

QA & Testing Engine — Complete Software Quality System

The definitive testing methodology for AI agents. From test strategy to execution, coverage to reporting — everything you need to ship quality software.

Phase 1: Test Strategy Design

Before writing a single test, design the strategy.

Strategy Brief Template

project:
  name: ""
  type: web-app | api | mobile | library | cli | data-pipeline
  languages: [typescript, python, go, java]
  frameworks: [react, express, django, spring]
  
risk_profile:
  data_sensitivity: low | medium | high | critical  # PII, financial, health
  user_impact: internal | b2b | b2c | life-safety
  deployment_frequency: daily | weekly | monthly
  regulatory: [none, SOC2, HIPAA, PCI-DSS, GDPR]

test_scope:
  in_scope: []    # Features, services, components
  out_of_scope: [] # Explicitly excluded (with reason)
  
environments:
  dev: { url: "", db: "local" }
  staging: { url: "", db: "seeded" }
  prod: { url: "", smoke_only: true }

Test Type Decision Matrix

Risk ProfileUnitIntegrationE2EPerformanceSecurityAccessibility
Internal tool✅ Core✅ API⚠️ Happy path⚠️ Basic
B2B SaaS✅ Full✅ Full✅ Critical flows✅ Load✅ OWASP Top 10✅ WCAG AA
B2C high-traffic✅ Full✅ Full✅ Full✅ Stress + soak✅ Full✅ WCAG AA
Financial/Health✅ Full + mutation✅ Full + contract✅ Full + chaos✅ Full suite✅ Pen test✅ WCAG AAA

Test Pyramid Architecture

         /  E2E  \          5-10% — Critical user journeys only
        / Integration \     20-30% — API contracts, service boundaries
       /    Unit Tests   \  60-70% — Business logic, pure functions

Anti-pattern: Ice cream cone — More E2E than unit tests. Slow, flaky, expensive. Fix by pushing test coverage DOWN the pyramid.

Anti-pattern: Hourglass — Lots of unit + E2E, no integration. Misses contract bugs between services.


Phase 2: Unit Testing Mastery

The AAA Pattern (Arrange-Act-Assert)

Every unit test follows this structure:

describe('PricingCalculator', () => {
  // Group by behavior, not by method
  describe('when customer has volume discount', () => {
    it('applies tiered pricing above threshold', () => {
      // ARRANGE — Set up the scenario
      const calculator = new PricingCalculator();
      const customer = createCustomer({ tier: 'enterprise', units: 150 });
      
      // ACT — Execute the behavior under test
      const price = calculator.calculate(customer);
      
      // ASSERT — Verify the outcome (ONE logical assertion)
      expect(price).toEqual({
        subtotal: 12000,
        discount: 1800,  // 15% volume discount
        total: 10200,
      });
    });
  });
});

Test Naming Convention

Format: [unit] [scenario] [expected behavior]

✅ Good:

  • PricingCalculator applies 15% discount when units exceed 100
  • UserService throws NotFoundError when user ID is invalid
  • parseDate returns null for malformed ISO strings

❌ Bad:

  • test1, should work, calculates price

What to Unit Test (Priority Order)

  1. Business logic — Pricing, rules, calculations, state machines
  2. Data transformations — Parsers, formatters, serializers, mappers
  3. Edge cases — Boundaries, null/undefined, empty collections, overflow
  4. Error handling — Every catch block, every validation path
  5. Pure functions — Easiest to test, highest ROI

What NOT to Unit Test

  • Framework internals (React rendering, Express routing)
  • Simple getters/setters with no logic
  • Third-party library behavior
  • Implementation details (private methods, internal state)

Mocking Rules

Dependency TypeStrategyExample
DatabaseMock the repository/DAOjest.mock('./userRepo')
HTTP APIMock the client or use MSWmsw.http.get('/api/users', ...)
File systemMock fs or use temp dirsjest.mock('fs/promises')
Time/DateFake timersjest.useFakeTimers()
RandomnessSeed or mockjest.spyOn(Math, 'random')
EnvironmentOverride env varsprocess.env.NODE_ENV = 'test'

Rule: Mock at boundaries, not internals. If you're mocking a class you own, your design might need refactoring.

Coverage Targets

MetricMinimumGoodExcellent
Line coverage70%85%95%+
Branch coverage60%80%90%+
Function coverage75%90%95%+
Critical path coverage100%100%100%

Warning: 100% coverage ≠ quality. Coverage measures what code ran, not what was verified. A test with no assertions has coverage but no value.


Phase 3: Integration Testing

API Testing Checklist

For every API endpoint, test:

endpoint: POST /api/orders
tests:
  happy_path:
    - Valid request returns 201 with order ID
    - Response matches schema
    - Database record created correctly
    - Events/webhooks fired
    
  validation:
    - Missing required fields → 400 with field errors
    - Invalid data types → 400 with type errors
    - Business rule violations → 422 with explanation
    
  authentication:
    - No token → 401
    - Expired token → 401
    - Wrong role → 403
    - Valid token → proceeds
    
  edge_cases:
    - Duplicate request (idempotency) → same response
    - Concurrent requests → no race condition
    - Maximum payload size → 413 or graceful handling
    - Special characters in input → no injection
    
  error_handling:
    - Database down → 503 with retry hint
    - External service timeout → 504 or fallback
    - Rate limit exceeded → 429 with retry-after

Contract Testing

When services communicate, test the contract:

contract:
  consumer: order-service
  provider: payment-service
  
  interactions:
    - description: "Process payment"
      request:
        method: POST
        path: /payments
        body:
          amount: 99.99
          currency: USD
          order_id: "ord_123"
      response:
        status: 200
        body:
          payment_id: "pay_xxx"  # string, not null
          status: "completed"    # enum: completed|pending|failed
          
  breaking_changes:  # NEVER do these without versioning
    - Remove a field from response
    - Change a field's type
    - Add a required field to request
    - Change the URL path
    - Change error response format

Database Testing Rules

  1. Each test gets a clean state — Use transactions that rollback, or truncate between tests
  2. Use factories, not fixturescreateUser({ role: 'admin' }) > hardcoded SQL dumps
  3. Test migrations — Run migrate-up, migrate-down, migrate-up (roundtrip)
  4. Test constraints — Unique violations, FK cascades, NOT NULL
  5. Test queries — Especially complex JOINs, aggregations, window functions

Phase 4: End-to-End Testing

Critical User Journey Mapping

Identify and test the flows that generate revenue or block users:

critical_journeys:
  - name: "Sign up → First value"
    steps:
      - Visit landing page
      - Click sign up
      - Fill registration form
      - Verify email
      - Complete onboarding
      - Perform first key action
    max_duration: 3 minutes
    
  - name: "Purchase flow"
    steps:
      - Browse products
      - Add to cart
      - Enter shipping
      - Enter payment
      - Confirm order
      - Receive confirmation email
    max_duration: 2 minutes
    
  - name: "Login → Core task → Logout"
    steps:
      - Login (password + SSO + MFA variants)
      - Navigate to core feature
      - Complete primary workflow
      - Verify result
      - Logout
    max_duration: 1 minute

E2E Best Practices

  1. Test user behavior, not implementation — Click buttons by text/role, not by CSS class
  2. Use data-testid sparingly — Only when no accessible selector exists
  3. Wait for state, not timewaitFor(element) not sleep(3000)
  4. Isolate test data — Each test creates its own users/data
  5. Run in CI with retries — 1 retry for flaky network, investigate if >5% flake rate

Selector Priority (Best → Worst)

  1. getByRole('button', { name: 'Submit' }) — Accessible, resilient
  2. getByLabelText('Email') — Form-specific, accessible
  3. getByText('Welcome back') — Content-based
  4. getByTestId('submit-btn') — Explicit test hook
  5. querySelector('.btn-primary') — ❌ Fragile, breaks on CSS changes

Flaky Test Triage

SymptomLikely CauseFix
Passes locally, fails in CITiming/race conditionAdd explicit waits, check CI resource limits
Fails intermittentlyShared state between testsIsolate test data, reset state
Fails after deployEnvironment differenceCheck env vars, API versions, feature flags
Fails at specific timeTime-dependent logicMock dates/times, avoid time-sensitive assertions
Fails in parallelResource contentionUse unique ports/DBs per worker

Rule: Quarantine flaky tests within 24 hours. A flaky test suite that everyone ignores is worse than no tests.


Phase 5: Performance Testing

Load Test Design

performance_tests:
  smoke:
    vus: 5
    duration: 1m
    purpose: "Verify test works"
    
  load:
    vus: 100  # Expected concurrent users
    duration: 10m
    ramp_up: 2m
    purpose: "Normal traffic behavior"
    thresholds:
      p95_response: <500ms
      error_rate: <1%
      
  stress:
    vus: 300  # 3x expected load
    duration: 15m
    ramp_up: 5m
    purpose: "Find breaking point"
    
  soak:
    vus: 80
    duration: 2h
    purpose: "Memory leaks, connection exhaustion"
    
  spike:
    stages:
      - { vus: 50, duration: 2m }
      - { vus: 500, duration: 30s }  # Sudden spike
      - { vus: 50, duration: 2m }
    purpose: "Recovery behavior"

Performance Budgets

MetricWeb AppAPIBackground Job
Response time (p50)<200ms<100msN/A
Response time (p95)<1s<500msN/A
Response time (p99)<3s<1sN/A
Throughput>100 rps>500 rps>1000/min
Error rate<0.1%<0.1%<0.5%
CPU usage<70%<70%<90%
Memory growth<5%/hr<2%/hr<10%/hr

Database Performance Testing

db_performance:
  query_tests:
    - name: "Dashboard aggregate query"
      baseline: 50ms
      max_acceptable: 200ms
      with_1M_rows: measure
      with_10M_rows: measure
      
  index_verification:
    - Run EXPLAIN ANALYZE on all critical queries
    - Verify no sequential scans on tables >10K rows
    - Check index usage statistics weekly
    
  connection_pool:
    - Test at max connections
    - Verify graceful handling when pool exhausted
    - Monitor connection wait time

Phase 6: Security Testing

OWASP Top 10 Test Checklist

security_tests:
  A01_broken_access_control:
    - [ ] Horizontal privilege escalation (access other user's data)
    - [ ] Vertical privilege escalation (access admin functions)
    - [ ] IDOR (Insecure Direct Object References)
    - [ ] Missing function-level access control
    - [ ] CORS misconfiguration
    
  A02_cryptographic_failures:
    - [ ] Sensitive data in transit (TLS 1.2+)
    - [ ] Sensitive data at rest (encryption)
    - [ ] Password hashing (bcrypt/argon2, not MD5/SHA)
    - [ ] No secrets in code/logs/URLs
    
  A03_injection:
    - [ ] SQL injection (parameterized queries)
    - [ ] NoSQL injection
    - [ ] Command injection (OS commands)
    - [ ] XSS (stored, reflected, DOM-based)
    - [ ] Template injection (SSTI)
    
  A04_insecure_design:
    - [ ] Rate limiting on auth endpoints
    - [ ] Account lockout after N failures
    - [ ] CAPTCHA on public forms
    - [ ] Business logic abuse scenarios
    
  A05_security_misconfiguration:
    - [ ] Default credentials removed
    - [ ] Error messages don't leak stack traces
    - [ ] Security headers set (CSP, HSTS, X-Frame-Options)
    - [ ] Directory listing disabled
    - [ ] Unnecessary HTTP methods disabled
    
  A07_auth_failures:
    - [ ] Brute force protection
    - [ ] Session fixation
    - [ ] Session timeout
    - [ ] JWT validation (signature, expiry, issuer)
    - [ ] MFA bypass attempts

Input Validation Test Payloads

Test every user input with:

injection_payloads:
  sql: ["' OR 1=1--", "'; DROP TABLE users;--", "1 UNION SELECT * FROM users"]
  xss: ["<script>alert(1)</script>", "<img onerror=alert(1) src=x>", "javascript:alert(1)"]
  path_traversal: ["../../etc/passwd", "..\\..\\windows\\system32", "%2e%2e%2f"]
  command: ["; ls -la", "| cat /etc/passwd", "$(whoami)", "`id`"]
  
boundary_values:
  strings: ["", " ", "a"*10000, null, undefined, "emoji: 🎯", "unicode: é à ü", "rtl: مرحبا"]
  numbers: [0, -1, 2147483647, -2147483648, NaN, Infinity, 0.1+0.2]
  arrays: [[], [null], Array(10000)]
  dates: ["1970-01-01", "2099-12-31", "invalid-date", "2024-02-29", "2023-02-29"]

Phase 7: Test Automation Architecture

Framework Selection Guide

NeedJavaScript/TSPythonGoJava
UnitVitest / Jestpytesttesting + testifyJUnit 5
APISupertesthttpx + pytestnet/http/httptestRestAssured
E2E (browser)PlaywrightPlaywrightchromedpSelenium
Performancek6LocustvegetaGatling
ContractPactPactPactPact
SecurityZAP + customBandit + customgosecSpotBugs

CI Pipeline Test Stages

pipeline:
  stage_1_fast:  # <2 min, blocks PR
    - Lint + type check
    - Unit tests
    - Security: dependency scan (npm audit / safety)
    
  stage_2_thorough:  # <10 min, blocks merge
    - Integration tests
    - Contract tests
    - Security: SAST scan
    - Coverage report + threshold check
    
  stage_3_confidence:  # <30 min, blocks deploy
    - E2E critical journeys
    - Visual regression (if applicable)
    - Security: container scan
    
  stage_4_post_deploy:  # After deploy to staging
    - Smoke tests against staging
    - Performance baseline check
    - Security: DAST scan (ZAP)
    
  stage_5_production:  # After prod deploy
    - Smoke tests (critical paths only)
    - Synthetic monitoring enabled
    - Canary metrics watching

Test Data Management

test_data_strategy:
  unit_tests:
    approach: factories  # Builder pattern, create exactly what you need
    example: "createUser({ role: 'admin', plan: 'enterprise' })"
    
  integration_tests:
    approach: seeded_database
    reset: per_test_suite  # Transaction rollback or truncate
    sensitive_data: anonymized  # Never use real PII
    
  e2e_tests:
    approach: api_setup  # Create data via API before test
    cleanup: after_each  # Delete created data
    isolation: unique_identifiers  # Timestamp or UUID in test data
    
  performance_tests:
    approach: representative_dataset
    volume: 10x_production  # Test with more data than prod
    generation: faker_libraries  # Realistic but synthetic

Phase 8: Quality Metrics & Reporting

Test Health Dashboard

metrics:
  test_suite_health:
    total_tests: 0
    passing: 0
    failing: 0
    skipped: 0  # >5% skipped = tech debt alarm
    flaky: 0    # >2% flaky = quarantine immediately
    
  coverage:
    line: "0%"
    branch: "0%"
    critical_paths: "0%"  # Must be 100%
    
  execution:
    unit_duration: "0s"    # Target: <30s
    integration_duration: "0s"  # Target: <5m
    e2e_duration: "0s"     # Target: <15m
    total_ci_time: "0s"    # Target: <20m
    
  defect_metrics:
    bugs_found_in_test: 0
    bugs_escaped_to_prod: 0
    escape_rate: "0%"      # Target: <5%
    mttr: "0h"             # Mean time to resolve
    
  trends:  # Track weekly
    new_tests_added: 0
    tests_deleted: 0  # Healthy deletion = removing redundant tests
    coverage_delta: "+0%"
    flake_rate_delta: "+0%"

Test Report Template

# Test Report — [Feature/Sprint/Release]

## Summary
- **Status:** ✅ PASS / ⚠️ PASS WITH RISKS / ❌ FAIL
- **Tests Run:** X | **Passed:** X | **Failed:** X | **Skipped:** X
- **Coverage:** Line X% | Branch X% | Critical 100%
- **Duration:** Xm Xs

## Key Findings

### 🔴 Critical (Block Release)
1. [Finding] — [Impact] — [Fix recommendation]

### 🟡 High (Fix Before Next Release)
1. [Finding] — [Impact] — [Fix recommendation]

### 🟢 Medium/Low (Backlog)
1. [Finding] — [Impact]

## Risk Assessment
- **Untested areas:** [list]
- **Known flaky tests:** [list with ticket IDs]
- **Performance concerns:** [if any]

## Recommendation
[Ship / Ship with monitoring / Hold for fixes]

Quality Score (0-100)

DimensionWeightScoring
Test coverage20%<60%=0, 60-70%=5, 70-80%=10, 80-90%=15, 90%+=20
Critical path coverage20%<100%=0, 100%=20
Defect escape rate15%>10%=0, 5-10%=5, 2-5%=10, <2%=15
Test suite speed10%>30m=0, 20-30m=3, 10-20m=7, <10m=10
Flake rate10%>5%=0, 2-5%=3, 1-2%=7, <1%=10
Security test coverage10%None=0, Basic=3, OWASP Top 10=7, Full=10
Documentation5%None=0, Basic=2, Complete=5
Automation ratio10%<50%=0, 50-70%=3, 70-90%=7, 90%+=10

Scoring: 0-40 = 🔴 Critical | 41-60 = 🟡 Needs Work | 61-80 = 🟢 Good | 81-100 = 💎 Excellent


Phase 9: Specialized Testing

Accessibility Testing (WCAG 2.1)

accessibility_checklist:
  level_a:  # Minimum compliance
    - [ ] All images have alt text
    - [ ] All form inputs have labels
    - [ ] Color is not the only visual indicator
    - [ ] Page has proper heading hierarchy (h1→h2→h3)
    - [ ] All functionality available via keyboard
    - [ ] Focus is visible and logical
    - [ ] No content flashes >3 times/second
    
  level_aa:  # Standard compliance (recommended)
    - [ ] Color contrast ratio ≥4.5:1 (normal text)
    - [ ] Color contrast ratio ≥3:1 (large text)
    - [ ] Text resizable to 200% without loss
    - [ ] Skip navigation links
    - [ ] Consistent navigation across pages
    - [ ] Error suggestions provided
    - [ ] ARIA landmarks for page regions
    
  tools:
    - axe-core (automated, catches ~30% of issues)
    - Lighthouse accessibility audit
    - Manual keyboard navigation test
    - Screen reader testing (VoiceOver/NVDA)

API Backward Compatibility Testing

compatibility_tests:
  when_updating_api:
    - [ ] All existing fields still present in response
    - [ ] No field type changes (string→number)
    - [ ] New required request fields have defaults
    - [ ] Deprecated fields still work (with warning header)
    - [ ] Error format unchanged
    - [ ] Pagination behavior unchanged
    - [ ] Rate limits not reduced
    
  versioning_strategy:
    - URL versioning: /v1/users, /v2/users
    - Header versioning: Accept: application/vnd.api+json;version=2
    - Sunset header for deprecated versions
    - Minimum 6-month deprecation notice

Chaos Engineering Principles

chaos_tests:
  network:
    - Service dependency goes down → graceful degradation?
    - Network latency increases 10x → timeout handling?
    - DNS resolution fails → fallback behavior?
    
  infrastructure:
    - Database primary fails → replica promotion?
    - Cache (Redis) goes down → DB fallback works?
    - Disk fills up → alerting + graceful failure?
    
  application:
    - Memory pressure → OOM handling?
    - CPU saturation → request queuing?
    - Certificate expiry → monitoring alert?
    
  data:
    - Corrupt message in queue → dead letter + alert?
    - Schema migration fails mid-way → rollback works?
    - Clock skew between services → idempotency holds?

Phase 10: Daily QA Workflow

For New Features

  1. Review requirements — Identify test scenarios before code is written (shift-left)
  2. Write test cases — Cover happy path, edge cases, error cases, security
  3. Review PR tests — Are tests meaningful? Do they test behavior, not implementation?
  4. Run full suite — Unit + integration + E2E for affected areas
  5. Report findings — Use the test report template above

For Bug Fixes

  1. Write failing test first — Reproduce the bug as a test
  2. Verify fix makes test pass — The test IS the proof
  3. Check for regression — Run related test suites
  4. Add to regression suite — Bug tests prevent re-introduction

Weekly QA Review

weekly_review:
  monday:
    - Review flaky test quarantine — fix or delete
    - Check coverage trends — declining = tech debt
    - Review escaped defects — update test strategy
    
  friday:
    - Update test health dashboard
    - Clean up obsolete tests
    - Document new testing patterns discovered
    - Plan next week's testing focus

Natural Language Commands

  • "Create test strategy for [project/feature]" → Full strategy brief
  • "Write unit tests for [function/class]" → AAA pattern tests with edge cases
  • "Test this API endpoint: [method] [path]" → Full API test checklist
  • "Review these tests for quality" → Test code review with scoring
  • "Generate performance test plan" → k6/Locust test design
  • "Security test [feature/endpoint]" → OWASP-based test checklist
  • "Create test report for [release]" → Formatted test report
  • "What's our test health?" → Dashboard with metrics and recommendations
  • "Find gaps in our test coverage" → Analysis with prioritized recommendations
  • "Help debug this flaky test" → Root cause analysis with fix suggestions
  • "Set up CI test pipeline" → Stage-by-stage pipeline config
  • "Accessibility audit [page/component]" → WCAG checklist with findings

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

QA & Test Engineering Command Center

Comprehensive QA system for planning strategy, writing tests, analyzing coverage, automating pipelines, performance and security testing, defect triage, and...

Registry SourceRecently Updated
5280Profile unavailable
Automation

QA Test Plan Generator

Generate detailed QA test plans with coverage matrices, test cases, bug severity, automation ROI, release checklists, and metrics dashboards for engineering...

Registry SourceRecently Updated
6681Profile unavailable
Security

Update Scout

Automate update tracking for OpenClaw and any other GitHub-released tools. Scout monitors your watchlist weekly, reviews release notes with a security lens,...

Registry SourceRecently Updated
430Profile unavailable
Security

Web Quality Audit

Comprehensive web quality audit covering performance, accessibility, SEO, best practices, and browser automation testing. Supports automated testing with Pin...

Registry SourceRecently Updated
1140Profile unavailable