pentest-validation

Pentest Validation

<default_to_action> When validating security findings:

REQUIRE explicit authorization for target URL
SCAN with qe-security-scanner (SAST + dependency + secrets)
ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
UPDATE exploit playbook with new patterns

Quality Gates:

Authorization confirmed before ANY exploitation
Target URL is staging/dev (NOT production)
Budget cap enforced ($15 default)
Time cap enforced (30 min default)
All exploitation attempts logged </default_to_action>

Quick Reference Card

The 4-Phase Pipeline

Phase Agent(s) Purpose Parallelism

Recon qe-security-scanner SAST, DAST, dependency scan, secrets Internal parallel
Analysis qe-security-reviewer + qe-security-auditor Code review + compliance check Both in parallel
Validation qe-pentest-validator Graduated exploit validation Per-vuln-type parallel
Report qe-quality-gate "No Exploit, No Report" filter Sequential

Graduated Exploitation Tiers

Tier Handler Cost Latency Use When

1 Agent Booster (WASM) $0 <1ms Code pattern is conclusive (eval, innerHTML, hardcoded creds)

2 Haiku $0.0002 ~500ms Need payload test against live target

3 Sonnet/Opus $0.003-$0.015 2-5s Full exploit chain with data proof

When to Use This Skill

Scenario Tier Estimated Cost

PR security review (source only) 1 $0

Pre-release validation (staging) 1-2 $1-5

Full pentest validation 1-3 $5-15

Compliance audit evidence 1-3 $5-15

Configuration

pentest: target_url: https://staging.app.com # REQUIRED for Tier 2-3 source_repo: ./src # REQUIRED for Tier 1+ exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit vuln_types: # Which pipelines to run - injection # SQL, NoSQL, command injection - xss # Reflected, stored, DOM XSS - auth # Auth bypass, session, JWT - ssrf # URL scheme abuse, metadata max_cost_usd: 15 # Budget cap per run timeout_minutes: 30 # Time cap per run require_authorization: true # MUST confirm target ownership no_production: true # Block production URLs production_patterns: # URL patterns to block - ".prod." - "api." - "www."

Safeguards (Mandatory)

Authorization Gate

Every pentest validation run MUST:

Display target URL and exploitation tier to user
Require explicit confirmation: "I own/authorized testing of this target"
Log authorization with timestamp
Block if target URL matches production patterns

What This Skill Does NOT Do

Full autonomous reconnaissance (Nmap, Subfinder)
Zero-day exploit development
Attack targets without explicit authorization
Test production systems
Store actual exfiltrated data (only proof of access)
Social engineering or phishing simulation
Port scanning or service discovery

Validation Pipelines

Injection Pipeline

Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)

SQL injection String concat in query ' OR '1'='1 response diff UNION SELECT data extraction

NoSQL injection $where , $gt in query Operator injection test Collection enumeration

Command injection exec() , system() calls Command delimiter test Reverse shell proof

LDAP injection String concat in filter Wildcard injection Directory enumeration

XSS Pipeline

Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)

Reflected XSS No output encoding <img onerror> reflection Browser JS execution via Playwright

Stored XSS innerHTML assignment Payload stored + retrieved Cookie theft PoC

DOM XSS document.write(location)

Fragment injection DOM manipulation proof

Auth Pipeline

Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)

JWT none No algorithm validation Modified JWT accepted Admin access with forged token

Session fixation No session rotation Pre-set session reused Cross-user session hijack

Credential stuffing No rate limiting 100 attempts unblocked Valid credential discovery

IDOR No authorization check Access other user data Full CRUD on foreign resources

SSRF Pipeline

Attack Tier 1 (Pattern) Tier 2 (Payload) Tier 3 (Full)

Internal URL User-controlled URL fetch http://169.254.169.254

Cloud metadata extraction

DNS rebinding URL validation bypass Rebind to internal IP Internal service access

Protocol smuggling URL scheme not restricted file:///etc/passwd

File content in response

Agent Coordination

Orchestration Pattern

// Phase 1: Recon (parallel scans) await Task("Security Scan", { target: "./src", layers: { sast: true, dast: true, dependencies: true, secrets: true } }, "qe-security-scanner");

// Phase 2: Analysis (parallel review) await Promise.all([ Task("Code Security Review", { findings: phase1Results, depth: "comprehensive" }, "qe-security-reviewer"),

Task("Compliance Audit", { findings: phase1Results, frameworks: ["owasp-top-10"] }, "qe-security-auditor") ]);

// Phase 3: Validation (graduated exploitation) await Task("Exploit Validation", { findings: [...phase1Results, ...phase2Results], target_url: "https://staging.app.com", exploitation_tier: 2, vuln_types: ["injection", "xss", "auth", "ssrf"], max_cost_usd: 15, timeout_minutes: 30 }, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate) await Task("Security Quality Gate", { findings: phase3Results.confirmedFindings, gate: "no-exploit-no-report", require_poc: true }, "qe-quality-gate");

Finding Classification

Status Meaning Action

confirmed-exploitable

Exploitation succeeded with PoC Report with evidence

likely-exploitable

Partial exploitation, defenses detected Report with caveats

not-exploitable

All exploitation attempts failed Filter from report

inconclusive

WAF/defense blocked, unclear if vulnerable Report for manual review

Exploit Playbook Memory

Namespace Structure

aqe/pentest/ playbook/ exploit/{vuln_type}/{tech_stack}/{technique} bypass/{defense_type}/{technique} payload/{vuln_type}/{variant} results/ validation-{timestamp} poc/ {finding_id}-poc

Learning Loop

Before validation: Query playbook for known patterns matching findings
During validation: Try known payloads first (higher success rate)
After validation: Store new successful patterns with confidence scores
Over time: Agent converges on most effective payloads per tech stack

Cost Optimization

Estimated Cost by Scenario

Scenario Tier Mix Findings Est. Cost Est. Time

PR check (source only) 100% Tier 1 5 $0 <5s

Sprint validation 70% T1, 30% T2 15 $2-5 5-10 min

Release validation 40% T1, 40% T2, 20% T3 25 $8-15 15-30 min

Full pentest 20% T1, 30% T2, 50% T3 40 $15-30 30-60 min

Cost vs Shannon Comparison

Metric Shannon AQE Pentest Validation

Cost per run ~$50 $5-15 (graduated tiers)

Runtime 60-90 min 15-30 min (parallel pipelines)

False positive rate Low (exploit-proven) Low (same principle)

Learning None (static prompts) ReasoningBank playbook

Success Metrics

Metric Target Measurement

False positive reduction

60% of findings eliminated Pre/post validator comparison

Exploit confirmation rate

80% of confirmed findings truly exploitable Manual PoC verification

Cost per run <$15 USD Token tracking per pipeline

Time per run <30 minutes Execution time metrics

Playbook growth 100+ patterns after 6 months Memory namespace count

Related Skills

security-testing - OWASP vulnerability scanning
qe-security-compliance - SAST/DAST automation
compliance-testing - Regulatory compliance
api-testing-patterns - API security testing
chaos-engineering-resilience - Security under chaos

Remember

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.

pentest-validation

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

n8n-security-testing

security-testing

security-visual-testing

v3 security overhaul