multi-ai-code-review

Multi-perspective code review using Claude, Gemini, and Codex as specialized agents. 5-dimensional analysis (security, performance, maintainability, correctness, style) with LLM-as-judge consensus, quality scoring, and CI/CD integration. Use when reviewing PRs, auditing code quality, preparing production releases, or establishing code review workflows.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "multi-ai-code-review" with this command: npx skills add adaptationio/skrillz/adaptationio-skrillz-multi-ai-code-review

Multi-AI Code Review

Overview

multi-ai-code-review provides comprehensive code review using multiple AI models as specialized agents, each analyzing code from a different perspective. Based on 2024-2025 best practices for AI-assisted code review.

Purpose: Multi-perspective code quality assessment using AI ensemble with human oversight

Pattern: Task-based (5 independent review dimensions + orchestration)

Key Principles (validated by tri-AI research):

  1. Multi-Agent Architecture - Specialized agents for each review dimension
  2. LLM-as-Judge Consensus - Flag issues only when 2+ models agree
  3. Progressive Severity - Critical → High → Medium → Low prioritization
  4. Human-in-Loop - AI suggests, human decides
  5. Quality Gates - Block merges for critical unresolved issues
  6. Actionable Feedback - Every comment has What/Where/Why/How

Quality Targets:

  • False Positive Rate: <15%
  • Fix Acceptance Rate: >40%
  • Review Turnaround: <5 minutes
  • Bug Catch Rate: >30% pre-production

When to Use

Use multi-ai-code-review when:

  • Reviewing pull requests (any size)
  • Auditing code quality before release
  • Establishing consistent code review standards
  • Security auditing code changes
  • Performance profiling changes
  • Technical debt assessment
  • Onboarding reviews (mentorship mode)

When NOT to Use:

  • Trivial changes (typos, comments only)
  • Automated dependency updates (use dependabot labels)
  • Generated code (migrations, scaffolds)

Prerequisites

Required

  • Code to review (diff, file, or directory)
  • At least one AI available (Claude required, Gemini/Codex optional)

Recommended

  • Gemini CLI for web research and fast analysis
  • Codex CLI for deep code reasoning
  • Git repository context

Integration

  • GitHub Actions (optional, for CI/CD)
  • Pre-commit hooks (optional, for local checks)

Review Dimensions

5-Dimensional Analysis

DimensionAgentFocusWeight
SecuritySecurity SpecialistOWASP Top 10, secrets, injection25%
PerformancePerformance EngineerComplexity, memory, latency20%
MaintainabilityArchitectPatterns, modularity, DRY25%
CorrectnessQA EngineerLogic, edge cases, tests20%
StyleNitpickerNaming, formatting, conventions10%

Severity Levels

LevelActionExamples
CriticalBlock mergeSQL injection, exposed secrets, data loss
HighRequire fixRace conditions, missing auth, memory leaks
MediumSuggest fixCode duplication, missing tests, complexity
LowOptionalStyle issues, naming, minor refactors

Operations

Operation 1: Quick Security Scan

Time: 2-5 minutes Automation: 80% Purpose: Fast security-focused review

Process:

  1. Scan for Critical Issues:
Review this code for security vulnerabilities:
- SQL injection
- XSS vulnerabilities
- Hardcoded secrets/API keys
- Authentication bypasses
- Authorization flaws
- Input validation gaps
- Insecure dependencies

Code:
[PASTE CODE OR DIFF]

For each issue found, provide:
- Severity (Critical/High/Medium)
- Location (file:line)
- Description (what's wrong)
- Fix (specific code change)
  1. Validate with Gemini (optional):
gemini -p "Verify these security findings. Are any false positives?
[PASTE CLAUDE FINDINGS]

Code context:
[PASTE RELEVANT CODE]"
  1. Output: Security report with consensus findings

Operation 2: Comprehensive PR Review

Time: 10-30 minutes Automation: 60% Purpose: Full multi-dimensional review

Process:

Step 1: Gather Context

# Get PR diff
git diff main...HEAD > /tmp/pr_diff.txt

# Identify affected areas
grep -E "^(\\+\\+\\+|---)" /tmp/pr_diff.txt | head -20

Step 2: Run Parallel Agent Reviews

Use Task tool to launch parallel agents:

Launch 3 parallel review agents:

Agent 1 (Security):
"Review this diff for security issues. Focus on:
- OWASP Top 10 vulnerabilities
- Authentication/authorization
- Input validation
- Secrets exposure
Diff: [DIFF]"

Agent 2 (Maintainability):
"Review this diff for maintainability. Focus on:
- Design patterns used correctly
- Code duplication (DRY)
- Modularity and cohesion
- Documentation quality
Diff: [DIFF]"

Agent 3 (Correctness):
"Review this diff for correctness. Focus on:
- Logic errors
- Edge cases not handled
- Test coverage gaps
- Error handling
Diff: [DIFF]"

Step 3: Orchestrate & Deduplicate

Synthesize findings from all agents:
[PASTE ALL AGENT OUTPUTS]

Tasks:
1. Remove duplicate findings
2. Rank by severity (Critical > High > Medium > Low)
3. Group by file
4. Generate summary table
5. Create final report with consensus issues only

Step 4: Generate Report

Output format:

## PR Review Summary

| File | Risk | Issues | Critical | High | Medium |
|------|------|--------|----------|------|--------|
| auth.py | High | 3 | 1 | 2 | 0 |
| api.py | Medium | 2 | 0 | 1 | 1 |

### Critical Issues (Block Merge)
1. **[auth.py:45]** SQL Injection vulnerability
   - Why: User input directly in query
   - Fix: Use parameterized queries

### High Issues (Require Fix)
...

### Consensus Score: 72/100
- Security: 65/100
- Performance: 80/100
- Maintainability: 70/100
- Correctness: 75/100
- Style: 85/100

Operation 3: LLM-as-Judge Tribunal

Time: 5-15 minutes Automation: 70% Purpose: High-confidence findings through consensus

Process:

  1. Run Code Through Multiple Models:

Claude Analysis:

Analyze this code for issues. Rate severity 1-10 for each:
[CODE]

Gemini Analysis (via CLI):

gemini -p "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"

Codex Analysis (via CLI):

codex "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"
  1. Calculate Consensus:
Given these analyses from 3 AI models:

Claude: [FINDINGS]
Gemini: [FINDINGS]
Codex: [FINDINGS]

Identify issues where at least 2 models agree:
1. List consensus findings
2. Average severity scores
3. Note any disagreements
4. Final verdict for each issue
  1. Output: High-confidence issue list (≥67% agreement)

Operation 4: Mentorship Review

Time: 15-30 minutes Automation: 40% Purpose: Educational code review for learning

Process:

Review this code in mentorship mode. For a developer learning [LANGUAGE/FRAMEWORK]:

Code: [CODE]

For each finding:
1. **What's the issue** (be encouraging, not critical)
2. **Why it matters** (explain the underlying concept)
3. **How to improve** (show before/after with explanation)
4. **Learn more** (link to relevant documentation)

Also highlight:
- What was done well
- Good patterns to continue using
- Growth opportunities

Tone: Supportive and educational, never condescending.

Operation 5: Pre-Release Audit

Time: 30-60 minutes Automation: 50% Purpose: Comprehensive review before production

Process:

  1. Full Codebase Scan:
# Identify all changes since last release
git diff v1.0.0...HEAD --stat
git log v1.0.0...HEAD --oneline
  1. Security Deep Dive:
  • Run all security checks
  • Verify no new vulnerabilities
  • Check dependency updates
  • Audit secrets management
  1. Performance Review:
  • Identify potential bottlenecks
  • Review database queries
  • Check for N+1 problems
  • Validate caching strategies
  1. Test Coverage:
  • Verify test coverage targets
  • Check critical path coverage
  • Validate edge case tests
  1. Generate Release Report:
## Pre-Release Audit: v1.1.0

### Security Clearance: PASS ✓
- No critical vulnerabilities
- All high issues resolved
- Secrets audit: Clean

### Performance Assessment: PASS ✓
- No new N+1 queries
- Response time within SLA
- Memory usage stable

### Test Coverage: 82% (target: 80%)
- Critical paths: 95%
- Edge cases: 78%

### Release Recommendation: APPROVED

Multi-AI Coordination

Agent Assignment Strategy

TaskPrimaryVerificationSpeed
Security scanClaudeGeminiFast
Architecture reviewClaudeCodexMedium
Logic validationCodexClaudeMedium
Style checkingGeminiClaudeFast
Performance analysisClaudeCodexMedium

Coordination Commands

Launch Multi-Agent Review:

# Using Task tool for parallel execution
# Each agent reviews independently, orchestrator synthesizes

Gemini Quick Check:

gemini -p "Quick security scan of this code: [CODE]"

Codex Deep Analysis:

codex "Analyze this code architecture and suggest improvements: [CODE]"

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/ai-review.yml
name: Multi-AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR Diff
        run: |
          git diff origin/main...HEAD > pr_diff.txt

      - name: Claude Review
        uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          model: "claude-sonnet-4-5-20250929"
          review_level: "detailed"

      - name: Post Summary
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## AI Review Summary\n${process.env.REVIEW_SUMMARY}`
            })

Quality Gate Configuration

# Block merge for critical issues
quality_gates:
  critical_issues: 0      # Must be zero
  high_issues: 3          # Max allowed
  coverage_minimum: 80    # Percent
  score_minimum: 70       # Out of 100

Quality Scoring

Scoring Formula

Overall = (Security × 0.25) + (Performance × 0.20) +
          (Maintainability × 0.25) + (Correctness × 0.20) +
          (Style × 0.10)

Grade Mapping

ScoreGradeStatus
≥90AExcellent - Ship it
80-89BGood - Minor fixes
70-79CAcceptable - Address issues
60-69DNeeds work - Significant fixes
<60FFailing - Major revision needed

Anti-Patterns to Detect

  1. Hardcoded Secrets - API keys, passwords in code
  2. SQL Injection - Unparameterized queries
  3. XSS Vulnerabilities - Unsanitized output
  4. Race Conditions - Unprotected shared state
  5. Memory Leaks - Unclosed resources
  6. N+1 Queries - Loop database calls
  7. Dead Code - Unreachable branches
  8. God Objects - Classes doing too much
  9. Copy-Paste Code - Duplicated logic
  10. Missing Error Handling - Unhandled exceptions

Example Review Session

User: Review this PR for my authentication module

Claude: I'll perform a comprehensive multi-dimensional review.

[Launches parallel agents for security, maintainability, correctness]

## PR Review: Authentication Module

### Critical Issues (1)
1. **[auth.py:67]** Password stored in plaintext
   - Severity: Critical
   - Consensus: 3/3 models agree
   - Fix: Use bcrypt hashing
   ```python
   # Before
   user.password = request.password

   # After
   import bcrypt
   user.password = bcrypt.hashpw(request.password.encode(), bcrypt.gensalt())

High Issues (2)

  1. [auth.py:45] No rate limiting on login endpoint
  2. [auth.py:89] JWT secret hardcoded

Quality Score: 58/100 (Grade: F)

  • Security: 35/100 (Critical issues)
  • Performance: 70/100
  • Maintainability: 65/100
  • Correctness: 60/100
  • Style: 80/100

Recommendation: BLOCK MERGE

Resolve critical security issues before merging.


---

## Related Skills

- **multi-ai-testing**: Generate tests for reviewed code
- **multi-ai-verification**: Validate fixes
- **multi-ai-implementation**: Implement suggested fixes
- **codex-review**: Codex-specific review patterns
- **review-multi**: Skill-specific reviews

---

## References

- `references/security-checklist.md` - OWASP Top 10 checklist
- `references/performance-patterns.md` - Performance anti-patterns
- `references/ci-cd-integration.md` - Full CI/CD setup guide

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

eks-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security-sandbox

No summary provided by upstream source.

Repository SourceNeeds Review
Security

ac-security-sandbox

No summary provided by upstream source.

Repository SourceNeeds Review
multi-ai-code-review | V50.AI