multi-ai-verification

Multi-layer quality assurance with 5-layer verification pyramid (Rules → Functional → Visual → Integration → Quality Scoring). Independent verification with LLM-as-judge and Agent-as-a-Judge patterns. Score 0-100 with ≥90 threshold. Use when verifying code quality, security scanning, preventing test gaming, comprehensive QA, or ensuring production readiness through multi-layer validation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "multi-ai-verification" with this command: npx skills add adaptationio/skrillz/adaptationio-skrillz-multi-ai-verification

Multi-AI Verification

Overview

multi-ai-verification provides comprehensive quality assurance through a 5-layer verification pyramid, from automated rules to LLM-as-judge evaluation.

Purpose: Multi-layer independent verification ensuring production-ready quality

Pattern: Task-based (5 independent verification operations, one per layer)

Key Innovation: 5-layer pyramid (95% automated at base → 0% at apex) with independent verification preventing bias and test gaming

Core Principles (validated by tri-AI research):

  1. Multi-Layer Defense - 5 layers catch different types of issues
  2. Independent Verification - Separate agent from implementation/testing
  3. Progressive Automation - Automate what can be automated (95% → 0%)
  4. Quality Scoring - Objective 0-100 scoring with ≥90 threshold
  5. Actionable Feedback - 100% feedback is specific and actionable (What/Where/Why/How/Priority)

Quality Gates: All 5 layers must pass for production approval


When to Use

Use multi-ai-verification when:

  • Final quality check before commit/deployment
  • Independent code review (preventing bias)
  • Security verification (OWASP, vulnerabilities)
  • Comprehensive QA (all layers)
  • Test quality verification (prevent gaming)
  • Production readiness validation

Prerequisites

Required

  • Code to verify (implementation complete)
  • Tests available (for functional verification)
  • Quality standards defined

Recommended

  • multi-ai-testing - For generating/running tests
  • multi-ai-implementation - For implementing fixes

Tools Available

  • Linters (ESLint, Pylint)
  • Type checkers (TypeScript, mypy)
  • Coverage tools (c8, pytest-cov)
  • Security scanners (Semgrep, Bandit)
  • Test frameworks (Jest, pytest)

The 5-Layer Verification Pyramid

         Layer 5: Quality Scoring
         (LLM-as-Judge, 0-20% automated)
              /\
             /  \
        Layer 4: Integration
        (E2E, System, 20-30% automated)
          /      \
         /        \
    Layer 3: Visual
    (UI, Screenshots, 30-50% automated)
      /          \
     /            \
Layer 2: Functional
(Tests, Coverage, 60-80% automated)
  /              \
 /                \
Layer 1: Rules-Based
(Linting, Types, Schema, 95% automated)

Principle: Fail fast at automated layers (cheap, fast) before expensive LLM-as-judge evaluation


Verification Operations

Operation 1: Rules-Based Verification (Layer 1)

Purpose: Automated validation of code structure, formatting, types

Automation: 95% automated Speed: Seconds (fast feedback) Confidence: High (deterministic)

Process:

  1. Schema Validation (if applicable):

    # Validate JSON/YAML against schemas
    ajv validate -s plan.schema.json -d plan.json
    ajv validate -s task.schema.json -d tasks/*.json
    
  2. Linting:

    # JavaScript/TypeScript
    npx eslint src/**/*.{ts,tsx,js,jsx}
    
    # Python
    pylint src/**/*.py
    
    # Expected: Zero linting errors
    
  3. Type Checking:

    # TypeScript
    npx tsc --noEmit
    
    # Python
    mypy src/
    
    # Expected: Zero type errors
    
  4. Format Validation:

    # Check formatting
    npx prettier --check src/**/*.{ts,tsx}
    
    # Or auto-fix
    npx prettier --write src/**/*.{ts,tsx}
    
  5. Security Scanning (SAST):

    # Static security analysis
    npx semgrep --config=auto src/
    
    # Or for Python
    bandit -r src/
    
    # Check for:
    # - Hardcoded secrets
    # - SQL injection risks
    # - XSS vulnerabilities
    # - Insecure dependencies
    
  6. Generate Layer 1 Report:

    # Layer 1: Rules-Based Verification
    
    ## Schema Validation
    ✅ plan.json validates
    ✅ All task files validate
    
    ## Linting
    ✅ 0 linting errors
    ⚠️ 3 warnings (non-blocking)
    
    ## Type Checking
    ✅ 0 type errors
    
    ## Formatting
    ✅ All files formatted correctly
    
    ## Security Scan (SAST)
    ✅ No critical vulnerabilities
    ⚠️ 1 medium: Weak password hashing rounds (bcrypt)
    
    **Layer 1 Status**: ✅ PASS (0 critical issues)
    **Issues to Address**: 1 medium security issue
    

Outputs:

  • Lint report (errors/warnings)
  • Type check results
  • Schema validation results
  • Security scan findings
  • Layer 1 status (PASS/FAIL)

Validation:

  • All automated checks run
  • Results documented
  • Critical issues = 0 for PASS
  • Actionable feedback for warnings

Time Estimate: 15-30 minutes (mostly automated)

Gate 1: ✅ PASS if no critical issues (warnings acceptable)


Operation 2: Functional Verification (Layer 2)

Purpose: Validate functionality through test execution and coverage

Automation: 60-80% automated Speed: Minutes (medium feedback) Confidence: High (measurable outcomes)

Process:

  1. Execute Complete Test Suite:

    # Run all tests with coverage
    npm test -- --coverage --verbose
    
    # Capture results
    # - Tests passed/failed
    # - Coverage metrics
    # - Execution time
    
  2. Validate Example Code (from documentation):

    # Extract examples from SKILL.md
    # Execute each example automatically
    # Verify outputs match expected
    
    # Target: ≥90% examples work
    
  3. Check Coverage:

    # Coverage Report
    
    **Line Coverage**: 87% ✅ (gate: ≥80%)
    **Branch Coverage**: 82% ✅
    **Function Coverage**: 92% ✅
    **Path Coverage**: 74% ✅
    
    **Gate Status**: PASS ✅ (all ≥80%)
    
    **Uncovered Code**:
    - src/admin/legacy.ts: 23% (low priority)
    - src/utils/deprecated.ts: 15% (deprecated, ok)
    
  4. Regression Testing (for updates):

    # Compare before/after
    git diff main...feature --stat
    
    # Run all tests
    npm test
    
    # Verify: No new failures (regression prevention)
    
  5. Performance Validation:

    # Run performance tests
    npm run test:performance
    
    # Check response times
    # Verify: Within acceptable ranges
    
  6. Generate Layer 2 Report:

    # Layer 2: Functional Verification
    
    ## Test Execution
    ✅ 245/245 tests passing (100%)
    ⏱️ Execution time: 8.3 seconds
    
    ## Coverage
    ✅ Line: 87% (gate: ≥80%)
    ✅ Branch: 82%
    ✅ Function: 92%
    
    ## Example Validation
    ✅ 18/20 examples work (90%)
    ❌ 2 examples fail (outdated)
    
    ## Regression
    ✅ All existing tests still pass
    
    ## Performance
    ✅ All endpoints <200ms
    
    **Layer 2 Status**: ✅ PASS
    **Issues**: 2 outdated examples (update docs)
    

Outputs:

  • Test execution results
  • Coverage report
  • Example validation results
  • Regression check
  • Performance metrics
  • Layer 2 status

Validation:

  • All tests executed
  • Coverage meets gate (≥80%)
  • Examples validated (≥90%)
  • No regressions
  • Performance acceptable

Time Estimate: 30-60 minutes

Gate 2: ✅ PASS if tests pass + coverage ≥80%


Operation 3: Visual Verification (Layer 3)

Purpose: Validate UI appearance, layout, accessibility (for UI features)

Automation: 30-50% automated Speed: Minutes-Hours Confidence: Medium (subjective elements)

Process:

  1. Screenshot Generation:

    # Generate screenshots of UI
    npx playwright test --screenshot=on
    
    # Or manually:
    # Open application
    # Capture screenshots of key views
    
  2. Visual Comparison (if previous version exists):

    # Compare against baseline
    npx playwright test --update-snapshots=missing
    
    # Or use Percy/Chromatic for visual regression
    npx percy snapshot screenshots/
    
  3. Layout Validation:

    # Visual Checklist
    
    ## Layout
    - [ ] Components positioned correctly
    - [ ] Spacing/margins match mockup
    - [ ] Alignment proper
    - [ ] No overlapping elements
    
    ## Styling
    - [ ] Colors match design system
    - [ ] Typography correct (fonts, sizes)
    - [ ] Icons/images display properly
    
    ## Responsiveness
    - [ ] Mobile view (320px-480px): ✅
    - [ ] Tablet view (768px-1024px): ✅
    - [ ] Desktop view (>1024px): ✅
    
  4. Accessibility Testing:

    # Automated accessibility scan
    npx axe-core src/
    
    # Check WCAG compliance
    npx pa11y http://localhost:3000
    
    # Manual checks:
    # - Keyboard navigation
    # - Screen reader compatibility
    # - Color contrast ratios
    
  5. Generate Layer 3 Report:

    # Layer 3: Visual Verification
    
    ## Screenshot Comparison
    ✅ Login page matches mockup
    ✅ Dashboard layout correct
    ⚠️ Profile page: Avatar alignment off by 5px
    
    ## Responsiveness
    ✅ Mobile: All components visible
    ✅ Tablet: Layout adapts correctly
    ✅ Desktop: Full functionality
    
    ## Accessibility
    ✅ WCAG 2.1 AA compliance
    ✅ Keyboard navigation works
    ⚠️ 2 color contrast warnings (non-critical)
    
    **Layer 3 Status**: ✅ PASS (minor issues acceptable)
    **Issues**: Avatar alignment (cosmetic), contrast warnings
    

Outputs:

  • Screenshots of UI
  • Visual comparison results
  • Responsiveness validation
  • Accessibility report
  • Layer 3 status

Validation:

  • Screenshots captured
  • Visual comparison done (if applicable)
  • Layout validated
  • Responsiveness tested
  • Accessibility checked
  • No critical visual issues

Time Estimate: 30-90 minutes (skip if no UI)

Gate 3: ✅ PASS if no critical visual/a11y issues


Operation 4: Integration Verification (Layer 4)

Purpose: Validate system-level integration, data flow, API compatibility

Automation: 20-30% automated Speed: Hours (complex) Confidence: Medium-High

Process:

  1. Component Integration Tests:

    # Run integration test suite
    npm test -- tests/integration/
    
    # Verify components work together
    # - Database ← → API
    # - API ← → Frontend
    # - Frontend ← → User
    
  2. Data Flow Validation:

    # Data Flow Verification
    
    **Flow 1: User Registration**
    Frontend form → API endpoint → Validation → Database → Email service
    ✅ Data flows correctly
    ✅ No data loss
    ✅ Transactions atomic
    
    **Flow 2: Authentication**
    Login request → API → Database lookup → Token generation → Response
    ✅ Token generated correctly
    ✅ Session stored
    ✅ Response includes token
    
  3. API Integration Tests:

    # Test all API endpoints
    npm run test:api
    
    # Verify:
    # - All endpoints respond
    # - Status codes correct
    # - Response formats match spec
    # - Error handling works
    
  4. End-to-End Workflow Tests:

    // Complete user journeys
    test('Complete registration and login flow', async () => {
      // 1. Register new user
      const registerResponse = await api.post('/register', userData);
      expect(registerResponse.status).toBe(201);
    
      // 2. Confirm email
      const confirmResponse = await api.get(confirmLink);
      expect(confirmResponse.status).toBe(200);
    
      // 3. Login
      const loginResponse = await api.post('/login', credentials);
      expect(loginResponse.status).toBe(200);
      expect(loginResponse.data.token).toBeDefined();
    
      // 4. Access protected resource
      const profileResponse = await api.get('/profile', {
        headers: { Authorization: `Bearer ${loginResponse.data.token}` }
      });
      expect(profileResponse.status).toBe(200);
    });
    
  5. Dependency Compatibility:

    # Check external dependencies work
    npm audit
    
    # Check for breaking changes
    npm outdated
    
    # Verify integration with services
    # - Database connection
    # - Redis/cache
    # - External APIs
    
  6. Generate Layer 4 Report:

    # Layer 4: Integration Verification
    
    ## Component Integration
    ✅ 12/12 integration tests passing
    ✅ All components integrate correctly
    
    ## Data Flow
    ✅ All 5 data flows validated
    ✅ No data loss or corruption
    
    ## API Integration
    ✅ All 15 endpoints functional
    ✅ Response formats correct
    ✅ Error handling works
    
    ## E2E Workflows
    ✅ 8/8 user journeys complete successfully
    ✅ No workflow breaks
    
    ## Dependencies
    ✅ 0 critical vulnerabilities
    ⚠️ 2 moderate (non-blocking)
    
    **Layer 4 Status**: ✅ PASS
    

Outputs:

  • Integration test results
  • Data flow validation
  • API compatibility report
  • E2E workflow results
  • Dependency audit
  • Layer 4 status

Validation:

  • Integration tests pass
  • Data flows validated
  • APIs integrate correctly
  • E2E workflows function
  • Dependencies secure

Time Estimate: 45-90 minutes

Gate 4: ✅ PASS if all integration tests pass, no critical dependencies


Operation 5: Quality Scoring (Layer 5)

Purpose: Holistic quality assessment using LLM-as-judge and Agent-as-a-Judge patterns

Automation: 0-20% automated Speed: Hours (expensive) Confidence: Medium (requires judgment)

Process:

  1. Spawn Independent Quality Assessor (Agent-as-a-Judge):

    Key: Use different model family if possible (prevent self-preference bias)

    const qualityAssessment = await task({
      description: "Assess code quality holistically",
      prompt: `Evaluate code quality in src/ and tests/.
    
      DO NOT read implementation conversation history.
    
      You have access to tools:
      - Read files
      - Execute tests
      - Run linters
      - Query database (if needed)
    
      Assess 5 dimensions (score each /20):
    
      1. CORRECTNESS (/20):
         - Logic correctness
         - Edge case handling
         - Error handling completeness
         - Security considerations
    
      2. FUNCTIONALITY (/20):
         - Meets all requirements
         - User workflows work
         - Performance acceptable
         - No regressions
    
      3. QUALITY (/20):
         - Code maintainability
         - Best practices followed
         - Anti-patterns avoided
         - Documentation complete
    
      4. INTEGRATION (/20):
         - Components integrate smoothly
         - API contracts correct
         - Data flow works
         - Backward compatible
    
      5. SECURITY (/20):
         - No vulnerabilities
         - Input validation
         - Authentication/authorization
         - Data protection
    
      TOTAL: /100 (sum of 5 dimensions)
    
      For each dimension, provide:
      - Score (/20)
      - Strengths (what's good)
      - Weaknesses (what needs improvement)
      - Evidence (file:line references)
      - Recommendations (specific, actionable)
    
      Write comprehensive report to: quality-assessment.md`
    });
    
  2. Multi-Agent Ensemble (for critical features):

    3-5 Agent Voting Committee:

    // Spawn 3 independent quality assessors
    const [judge1, judge2, judge3] = await Promise.all([
      task({description: "Quality Judge 1", prompt: assessmentPrompt}),
      task({description: "Quality Judge 2", prompt: assessmentPrompt}),
      task({description: "Quality Judge 3", prompt: assessmentPrompt})
    ]);
    
    // Aggregate scores
    const scores = {
      correctness: median([judge1.correctness, judge2.correctness, judge3.correctness]),
      functionality: median([...]),
      quality: median([...]),
      integration: median([...]),
      security: median([...])
    };
    
    const totalScore = sum(Object.values(scores)); // Total /100
    
    // Check variance
    const totalScores = [judge1.total, judge2.total, judge3.total];
    const variance = max(totalScores) - min(totalScores);
    
    if (variance > 15) {
      // High disagreement → spawn 2 more judges (total 5)
      // Use 5-agent ensemble for final score
    }
    
    // Final score: median of 3 or 5
    
  3. Calibration Against Rubric:

    # Scoring Calibration
    
    ## Correctness: 18/20 (Excellent)
    **20**: Zero errors, all edge cases handled perfectly
    **18**: Minor edge case missing, otherwise excellent ✅ (achieved)
    **15**: 1-2 significant edge cases missing
    **10**: Some logic errors present
    **0**: Major functionality broken
    
    **Evidence**: All tests pass, edge cases covered except timezone DST edge case (minor)
    
    ## Functionality: 19/20 (Excellent)
    [Similar rubric with evidence]
    
    ## Quality: 17/20 (Good)
    [Similar rubric with evidence]
    
    ## Integration: 18/20 (Excellent)
    [Similar rubric with evidence]
    
    ## Security: 16/20 (Good)
    [Similar rubric with evidence]
    
    **Total**: 88/100 ⚠️ (Below ≥90 gate)
    
  4. Gap Analysis (if <90):

    # Quality Gap Analysis
    
    **Current Score**: 88/100
    **Target**: ≥90/100
    **Gap**: 2 points
    
    ## Critical Gaps (Blocking Approval)
    None
    
    ## High Priority (Should Fix for ≥90)
    1. **Security: Weak bcrypt rounds**
       - **What**: bcrypt using 10 rounds (outdated)
       - **Where**: src/auth/hash.ts:15
       - **Why**: Current standard is 12-14 rounds
       - **How**: Change `bcrypt.hash(password, 10)` to `bcrypt.hash(password, 12)`
       - **Priority**: High
       - **Impact**: +2 points → 90/100
    
    ## Medium Priority
    1. **Quality: Missing JSDoc for 3 functions**
       - Impact: +1 point → 91/100
    
    **Recommendation**: Fix high priority issue to reach ≥90 threshold
    **Estimated Effort**: 15 minutes
    
  5. Generate Comprehensive Quality Report:

    # Layer 5: Quality Scoring Report
    
    ## Executive Summary
    **Total Score**: 88/100 ⚠️ (Below ≥90 gate)
    **Status**: NEEDS MINOR REVISION
    
    ## Dimension Scores
    - Correctness: 18/20 ⭐⭐⭐⭐⭐
    - Functionality: 19/20 ⭐⭐⭐⭐⭐
    - Quality: 17/20 ⭐⭐⭐⭐
    - Integration: 18/20 ⭐⭐⭐⭐⭐
    - Security: 16/20 ⭐⭐⭐⭐
    
    ## Strengths
    1. Comprehensive test coverage (87%)
    2. All functionality working correctly
    3. Clean integration with all components
    4. Good error handling
    
    ## Weaknesses
    1. Bcrypt rounds below current standard (security)
    2. Missing documentation for helper functions (quality)
    3. One timezone edge case not handled (correctness)
    
    ## Recommendations (Prioritized)
    
    ### Priority 1 (High - Needed for ≥90)
    1. Increase bcrypt rounds: 10 → 12
       - File: src/auth/hash.ts:15
       - Effort: 5 min
       - Impact: +2 points
    
    ### Priority 2 (Medium - Nice to Have)
    1. Add JSDoc to helper functions
       - Files: src/utils/validation.ts
       - Effort: 30 min
       - Impact: +1 point
    
    2. Handle timezone DST edge case
       - File: src/auth/tokens.ts:78
       - Effort: 20 min
       - Impact: +1 point
    
    **Next Steps**: Apply Priority 1 fix, re-verify to reach ≥90
    

Outputs:

  • Quality score (0-100) with dimension breakdown
  • Calibrated against rubric
  • Gap analysis
  • Prioritized recommendations (Critical/High/Medium/Low)
  • Evidence-based feedback (file:line references)
  • Action plan to reach ≥90

Validation:

  • All 5 dimensions scored
  • Scores calibrated against rubric
  • Evidence provided for each score
  • Gap analysis if <90
  • Recommendations actionable
  • Ensemble used for critical features (optional)

Time Estimate: 60-120 minutes (ensemble adds 30-60 min)

Gate 5: ✅ PASS if total score ≥90/100


Quality Gates Summary

All 5 Gates Must Pass for production approval:

Gate 1: Rules Pass ✅
   ↓ (Linting, types, schema, security)

Gate 2: Tests Pass ✅
   ↓ (All tests, coverage ≥80%)

Gate 3: Visual OK ✅
   ↓ (UI validated, a11y checked)

Gate 4: Integration OK ✅
   ↓ (E2E works, APIs integrate)

Gate 5: Quality ≥90 ✅
   ↓ (LLM-as-judge score ≥90/100)

✅ PRODUCTION APPROVED

If Any Gate Fails:

Failed Gate → Gap Analysis → Apply Fixes → Re-Verify → Repeat Until Pass

Appendix A: Independence Protocol

How Verification Independence is Maintained

Verification Agent Spawning:

// After implementation and testing complete
const verification = await task({
  description: "Independent quality verification",
  prompt: `Verify code quality independently.

  DO NOT read prior conversation history.

  Review:
  - Code: src/**/*.ts
  - Tests: tests/**/*.test.ts
  - Specs: specs/requirements.md

  Verify against specifications ONLY (not implementation decisions).

  Use tools:
  - Read files to inspect code
  - Run tests to verify functionality
  - Execute linters for quality checks

  Score quality (0-100) with evidence.
  Write report to: independent-verification.md`
});

Bias Prevention Checklist:

  • Specifications written BEFORE implementation
  • Verification agent prompt has no implementation context
  • Agent evaluates against specs, not what code does
  • Fresh context (via Task tool)
  • Different model family used (if possible)

Validation of Independence:

## Independence Audit

**Expected Behavior**:
- ✅ Verifier finds 1-3 issues (healthy skepticism)
- ✅ Verifier references specifications
- ✅ Verifier uses tools to verify claims

**Warning Signs**:
- ⚠️ Verifier finds 0 issues (possible rubber stamp)
- ⚠️ Verifier doesn't use tools
- ⚠️ Verifier parrots implementation justifications

**If Warning**: Re-verify with stronger independence prompt

Appendix B: Operational Scoring Rubrics

Complete Rubrics for All 5 Dimensions

Correctness (/20)

20 (Perfect): Zero logic errors, all edge cases handled, security perfect 18 (Excellent): 1 minor edge case missing, otherwise flawless 15 (Good): 2-3 edge cases missing, no critical errors 12 (Acceptable): Some edge cases missing, 1 minor logic issue 10 (Needs Work): Multiple edge cases missing or 1 significant logic error 5 (Poor): Major logic errors present 0 (Broken): Critical functionality broken

Functionality (/20)

20: All requirements met, exceeds expectations 18: All requirements met, well implemented 15: All requirements met, basic implementation 12: 1 requirement partially missing 10: 2+ requirements partially missing 5: Several requirements not met 0: Core functionality missing

Quality (/20)

20: Exceptional code quality, best practices exemplified 18: High quality, follows best practices 15: Good quality, minor style issues 12: Acceptable quality, several style issues 10: Below standard, needs refactoring 5: Poor quality, significant issues 0: Unmaintainable code

Integration (/20)

20: Perfect integration, all touch points verified 18: Excellent integration, minor docs needed 15: Good integration, all major points work 12: Acceptable, 1-2 integration issues 10: Integration issues present 5: Multiple integration problems 0: Does not integrate

Security (/20)

20: Passes all security scans, OWASP compliant, hardened 18: Passes scans, 1 minor non-critical issue 15: Passes, 2-3 minor issues 12: 1 medium security issue 10: Multiple medium issues 5: 1 critical issue present 0: Multiple critical vulnerabilities


Appendix C: Technical Foundation

Verification Tools

Linting:

  • ESLint (JavaScript/TypeScript)
  • Pylint/Ruff (Python)

Type Checking:

  • TypeScript compiler (tsc)
  • mypy (Python)

Security (SAST):

  • Semgrep (multi-language)
  • Bandit (Python)
  • npm audit (JavaScript)

Visual Testing:

  • Playwright (screenshot, visual regression)
  • Percy/Chromatic (visual diff)
  • axe-core (accessibility)

Coverage:

  • c8/nyc (JavaScript)
  • pytest-cov (Python)

Cost Controls

Budget Caps:

  • LLM-as-judge: $50/month
  • Ensemble verification: $20/month
  • Total verification: $70/month

Optimization:

  • Cache quality scores for 24h (same code → same score)
  • Skip Layer 5 for changes <50 lines
  • Use ensemble (3-5 agents) only for critical features
  • Use cheaper models for pre-filtering (Haiku for Layer 1-2)

Quick Reference

The 5 Layers

LayerPurposeAutomationTimeTools
1Rules-based95%15-30mLinters, types, SAST
2Functional60-80%30-60mTest execution, coverage
3Visual30-50%30-90mScreenshots, a11y
4Integration20-30%45-90mE2E, API tests
5Quality Scoring0-20%60-120mLLM-as-judge, ensemble

Total: 3-6 hours for complete 5-layer verification

Quality Thresholds

  • ≥90: ✅ Excellent (production-ready)
  • 80-89: ⚠️ Good (needs minor improvements)
  • 70-79: ❌ Acceptable (needs work before production)
  • <70: ❌ Poor (significant rework required)

Gates

All 5 Must Pass:

  1. Rules pass (no critical lint/type/security)
  2. Tests pass + coverage ≥80%
  3. Visual OK (no critical UI issues)
  4. Integration OK (E2E works)
  5. Quality ≥90/100

multi-ai-verification provides comprehensive, multi-layer quality assurance with independent LLM-as-judge evaluation, ensuring production-ready code through systematic verification from automated rules to holistic quality assessment.

For rubrics, see Appendix B. For independence protocol, see Appendix A.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

eks-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security-sandbox

No summary provided by upstream source.

Repository SourceNeeds Review
Security

ac-security-sandbox

No summary provided by upstream source.

Repository SourceNeeds Review
General

finnhub-api

No summary provided by upstream source.

Repository SourceNeeds Review