qe-chaos-resilience

Guide the use of v3's chaos engineering capabilities including controlled fault injection, load/stress testing, resilience validation, and disaster recovery testing.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qe-chaos-resilience" with this command: npx skills add proffesor-for-testing/agentic-qe/proffesor-for-testing-agentic-qe-qe-chaos-resilience

QE Chaos Resilience

Purpose

Guide the use of v3's chaos engineering capabilities including controlled fault injection, load/stress testing, resilience validation, and disaster recovery testing.

Activation

  • When testing system resilience

  • When performing chaos experiments

  • When load/stress testing

  • When validating disaster recovery

  • When testing circuit breakers

Quick Start

Run chaos experiment

aqe chaos run --experiment network-latency --target api-service

Load test

aqe chaos load --scenario peak-traffic --duration 30m

Stress test to breaking point

aqe chaos stress --endpoint /api/users --max-users 10000

Test circuit breaker

aqe chaos circuit-breaker --service payment-service

Agent Workflow

// Chaos experiment Task("Run chaos experiment", ` Execute controlled chaos on api-service:

  • Inject 500ms network latency
  • Monitor service health metrics
  • Verify circuit breaker activation
  • Measure recovery time
  • Document findings `, "qe-chaos-engineer")

// Load testing Task("Performance load test", ` Run load test simulating Black Friday traffic:

  • Ramp up to 10,000 concurrent users
  • Maintain load for 30 minutes
  • Monitor response times and error rates
  • Identify bottlenecks
  • Compare against SLAs `, "qe-load-tester")

Chaos Experiments

  1. Fault Injection

await chaosEngineer.injectFault({ target: 'api-service', fault: { type: 'latency', parameters: { delay: '500ms', jitter: '100ms', percentage: 50 } }, duration: '5m', monitoring: { metrics: ['response_time', 'error_rate', 'throughput'], alerts: true }, rollback: { automatic: true, trigger: 'error_rate > 10%' } });

  1. Load Testing

await loadTester.execute({ scenario: 'peak-traffic', profile: { rampUp: '5m', steadyState: '30m', rampDown: '5m' }, users: { initial: 100, target: 5000, pattern: 'linear' }, assertions: { p95_latency: '<500ms', error_rate: '<1%', throughput: '>1000rps' } });

  1. Stress Testing

await loadTester.stressTest({ endpoint: '/api/checkout', strategy: 'step-increase', steps: [100, 500, 1000, 2000, 5000], stepDuration: '5m', findBreakingPoint: true, monitoring: { resourceUtilization: true, databaseConnections: true, memoryUsage: true } });

  1. Resilience Validation

await resilienceTester.validate({ scenarios: [ 'database-failover', 'cache-failure', 'external-service-timeout', 'pod-termination' ], expectations: { gracefulDegradation: true, automaticRecovery: true, dataIntegrity: true, recoveryTime: '<30s' } });

Fault Types

Fault Description Use Case

Latency Add network delay Test timeouts

Packet Loss Drop network packets Test retry logic

CPU Stress Consume CPU Test resource limits

Memory Pressure Consume memory Test OOM handling

Disk Full Fill disk space Test disk errors

Process Kill Terminate process Test recovery

Chaos Report

interface ChaosReport { experiment: { name: string; target: string; fault: FaultConfig; duration: number; }; results: { hypothesis: string; validated: boolean; metrics: { before: MetricSnapshot; during: MetricSnapshot; after: MetricSnapshot; }; events: ChaosEvent[]; recovery: { detected: boolean; time: number; automatic: boolean; }; }; findings: { severity: 'critical' | 'high' | 'medium' | 'low'; description: string; recommendation: string; }[]; artifacts: { logs: string; metrics: string; traces: string; }; }

Safety Controls

safety: blast_radius: max_affected_pods: 1 max_affected_percentage: 10

abort_conditions: - error_rate > 50% - p99_latency > 10s - service_unavailable

excluded_environments: - production-critical

required_approvals: production: 2 staging: 0

SLA Validation

await resilienceTester.validateSLA({ slas: { availability: 99.9, p95_latency: 500, error_rate: 0.1 }, period: '30d', report: { breaches: true, trends: true, projections: true } });

Coordination

Primary Agents: qe-chaos-engineer, qe-load-tester, qe-resilience-tester Coordinator: qe-chaos-coordinator Related Skills: qe-performance, qe-security-compliance

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

api-testing-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

compatibility-testing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

regression-testing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

test-automation-strategy

No summary provided by upstream source.

Repository SourceNeeds Review