Pentest Validation
<default_to_action> When validating security findings:
-
REQUIRE explicit authorization for target URL
-
SCAN with qe-security-scanner (SAST + dependency + secrets)
-
ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
-
VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
-
REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
-
UPDATE exploit playbook with new patterns
Quality Gates:
-
Authorization confirmed before ANY exploitation
-
Target URL is staging/dev (NOT production)
-
Budget cap enforced ($15 default)
-
Time cap enforced (30 min default)
-
All exploitation attempts logged </default_to_action>
Quick Reference Card
The 4-Phase Pipeline
Phase Agent(s) Purpose Parallelism
-
Recon qe-security-scanner SAST, DAST, dependency scan, secrets Internal parallel
-
Analysis qe-security-reviewer + qe-security-auditor Code review + compliance check Both in parallel
-
Validation qe-pentest-validator Graduated exploit validation Per-vuln-type parallel
-
Report qe-quality-gate "No Exploit, No Report" filter Sequential
Graduated Exploitation Tiers
Tier Handler Cost Latency Use When
1 Agent Booster (WASM) $0 <1ms Code pattern is conclusive (eval, innerHTML, hardcoded creds)
2 Haiku $0.0002 ~500ms Need payload test against live target
3 Sonnet/Opus $0.003-$0.015 2-5s Full exploit chain with data proof
When to Use This Skill
Scenario Tier Estimated Cost
PR security review (source only) 1 $0
Pre-release validation (staging) 1-2 $1-5
Full pentest validation 1-3 $5-15
Compliance audit evidence 1-3 $5-15
Configuration
pentest: target_url: https://staging.app.com # REQUIRED for Tier 2-3 source_repo: ./src # REQUIRED for Tier 1+ exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit vuln_types: # Which pipelines to run - injection # SQL, NoSQL, command injection - xss # Reflected, stored, DOM XSS - auth # Auth bypass, session, JWT - ssrf # URL scheme abuse, metadata max_cost_usd: 15 # Budget cap per run timeout_minutes: 30 # Time cap per run require_authorization: true # MUST confirm target ownership no_production: true # Block production URLs production_patterns: # URL patterns to block - ".prod." - "api." - "www."
Safeguards (Mandatory)
Authorization Gate
Every pentest validation run MUST:
-
Display target URL and exploitation tier to user
-
Require explicit confirmation: "I own/authorized testing of this target"
-
Log authorization with timestamp
-
Block if target URL matches production patterns
What This Skill Does NOT Do
-
Full autonomous reconnaissance (Nmap, Subfinder)
-
Zero-day exploit development
-
Attack targets without explicit authorization
-
Test production systems
-
Store actual exfiltrated data (only proof of access)
-
Social engineering or phishing simulation
-
Port scanning or service discovery
Validation Pipelines
Injection Pipeline
Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)
SQL injection String concat in query ' OR '1'='1 response diff UNION SELECT data extraction
NoSQL injection $where , $gt in query Operator injection test Collection enumeration
Command injection exec() , system() calls Command delimiter test Reverse shell proof
LDAP injection String concat in filter Wildcard injection Directory enumeration
XSS Pipeline
Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)
Reflected XSS No output encoding <img onerror> reflection Browser JS execution via Playwright
Stored XSS innerHTML assignment Payload stored + retrieved Cookie theft PoC
DOM XSS document.write(location)
Fragment injection DOM manipulation proof
Auth Pipeline
Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)
JWT none No algorithm validation Modified JWT accepted Admin access with forged token
Session fixation No session rotation Pre-set session reused Cross-user session hijack
Credential stuffing No rate limiting 100 attempts unblocked Valid credential discovery
IDOR No authorization check Access other user data Full CRUD on foreign resources
SSRF Pipeline
Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)
Internal URL User-controlled URL fetch http://169.254.169.254
Cloud metadata extraction
DNS rebinding URL validation bypass Rebind to internal IP Internal service access
Protocol smuggling URL scheme not restricted file:///etc/passwd
File content in response
Agent Coordination
Orchestration Pattern
// Phase 1: Recon (parallel scans) await Task("Security Scan", { target: "./src", layers: { sast: true, dast: true, dependencies: true, secrets: true } }, "qe-security-scanner");
// Phase 2: Analysis (parallel review) await Promise.all([ Task("Code Security Review", { findings: phase1Results, depth: "comprehensive" }, "qe-security-reviewer"),
Task("Compliance Audit", { findings: phase1Results, frameworks: ["owasp-top-10"] }, "qe-security-auditor") ]);
// Phase 3: Validation (graduated exploitation) await Task("Exploit Validation", { findings: [...phase1Results, ...phase2Results], target_url: "https://staging.app.com", exploitation_tier: 2, vuln_types: ["injection", "xss", "auth", "ssrf"], max_cost_usd: 15, timeout_minutes: 30 }, "qe-pentest-validator");
// Phase 4: Report ("No Exploit, No Report" gate) await Task("Security Quality Gate", { findings: phase3Results.confirmedFindings, gate: "no-exploit-no-report", require_poc: true }, "qe-quality-gate");
Finding Classification
Status Meaning Action
confirmed-exploitable
Exploitation succeeded with PoC Report with evidence
likely-exploitable
Partial exploitation, defenses detected Report with caveats
not-exploitable
All exploitation attempts failed Filter from report
inconclusive
WAF/defense blocked, unclear if vulnerable Report for manual review
Exploit Playbook Memory
Namespace Structure
aqe/pentest/ playbook/ exploit/{vuln_type}/{tech_stack}/{technique} bypass/{defense_type}/{technique} payload/{vuln_type}/{variant} results/ validation-{timestamp} poc/ {finding_id}-poc
Learning Loop
-
Before validation: Query playbook for known patterns matching findings
-
During validation: Try known payloads first (higher success rate)
-
After validation: Store new successful patterns with confidence scores
-
Over time: Agent converges on most effective payloads per tech stack
Cost Optimization
Estimated Cost by Scenario
Scenario Tier Mix Findings Est. Cost Est. Time
PR check (source only) 100% Tier 1 5 $0 <5s
Sprint validation 70% T1, 30% T2 15 $2-5 5-10 min
Release validation 40% T1, 40% T2, 20% T3 25 $8-15 15-30 min
Full pentest 20% T1, 30% T2, 50% T3 40 $15-30 30-60 min
Cost vs Shannon Comparison
Metric Shannon AQE Pentest Validation
Cost per run ~$50 $5-15 (graduated tiers)
Runtime 60-90 min 15-30 min (parallel pipelines)
False positive rate Low (exploit-proven) Low (same principle)
Learning None (static prompts) ReasoningBank playbook
Success Metrics
Metric Target Measurement
False positive reduction
60% of findings eliminated Pre/post validator comparison
Exploit confirmation rate
80% of confirmed findings truly exploitable Manual PoC verification
Cost per run <$15 USD Token tracking per pipeline
Time per run <30 minutes Execution time metrics
Playbook growth 100+ patterns after 6 months Memory namespace count
Related Skills
-
security-testing - OWASP vulnerability scanning
-
qe-security-compliance - SAST/DAST automation
-
compliance-testing - Regulatory compliance
-
api-testing-patterns - API security testing
-
chaos-engineering-resilience - Security under chaos
Remember
"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.