Multi-AI Code Review

Overview

multi-ai-code-review provides comprehensive code review using multiple AI models as specialized agents, each analyzing code from a different perspective. Based on 2024-2025 best practices for AI-assisted code review.

Purpose: Multi-perspective code quality assessment using AI ensemble with human oversight

Pattern: Task-based (5 independent review dimensions + orchestration)

Key Principles (validated by tri-AI research):

Multi-Agent Architecture - Specialized agents for each review dimension
LLM-as-Judge Consensus - Flag issues only when 2+ models agree
Progressive Severity - Critical → High → Medium → Low prioritization
Human-in-Loop - AI suggests, human decides
Quality Gates - Block merges for critical unresolved issues
Actionable Feedback - Every comment has What/Where/Why/How

Quality Targets:

False Positive Rate: <15%
Fix Acceptance Rate: >40%
Review Turnaround: <5 minutes
Bug Catch Rate: >30% pre-production

When to Use

Use multi-ai-code-review when:

Reviewing pull requests (any size)
Auditing code quality before release
Establishing consistent code review standards
Security auditing code changes
Performance profiling changes
Technical debt assessment
Onboarding reviews (mentorship mode)

When NOT to Use:

Trivial changes (typos, comments only)
Automated dependency updates (use dependabot labels)
Generated code (migrations, scaffolds)

Prerequisites

Required

Code to review (diff, file, or directory)
At least one AI available (Claude required, Gemini/Codex optional)

Integration

GitHub Actions (optional, for CI/CD)
Pre-commit hooks (optional, for local checks)

Review Dimensions

5-Dimensional Analysis

Dimension	Agent	Focus	Weight
Security	Security Specialist	OWASP Top 10, secrets, injection	25%
Performance	Performance Engineer	Complexity, memory, latency	20%
Maintainability	Architect	Patterns, modularity, DRY	25%
Correctness	QA Engineer	Logic, edge cases, tests	20%
Style	Nitpicker	Naming, formatting, conventions	10%

Severity Levels

Level	Action	Examples
Critical	Block merge	SQL injection, exposed secrets, data loss
High	Require fix	Race conditions, missing auth, memory leaks
Medium	Suggest fix	Code duplication, missing tests, complexity
Low	Optional	Style issues, naming, minor refactors

Operations

Operation 1: Quick Security Scan

Time: 2-5 minutes Automation: 80% Purpose: Fast security-focused review

Process:

Scan for Critical Issues:

Review this code for security vulnerabilities:
- SQL injection
- XSS vulnerabilities
- Hardcoded secrets/API keys
- Authentication bypasses
- Authorization flaws
- Input validation gaps
- Insecure dependencies

Code:
[PASTE CODE OR DIFF]

For each issue found, provide:
- Severity (Critical/High/Medium)
- Location (file:line)
- Description (what's wrong)
- Fix (specific code change)

Validate with Gemini (optional):

gemini -p "Verify these security findings. Are any false positives?
[PASTE CLAUDE FINDINGS]

Code context:
[PASTE RELEVANT CODE]"

Output: Security report with consensus findings

Operation 2: Comprehensive PR Review

Time: 10-30 minutes Automation: 60% Purpose: Full multi-dimensional review

Process:

Step 1: Gather Context

# Get PR diff
git diff main...HEAD > /tmp/pr_diff.txt

# Identify affected areas
grep -E "^(\\+\\+\\+|---)" /tmp/pr_diff.txt | head -20

Step 2: Run Parallel Agent Reviews

Use Task tool to launch parallel agents:

Launch 3 parallel review agents:

Agent 1 (Security):
"Review this diff for security issues. Focus on:
- OWASP Top 10 vulnerabilities
- Authentication/authorization
- Input validation
- Secrets exposure
Diff: [DIFF]"

Agent 2 (Maintainability):
"Review this diff for maintainability. Focus on:
- Design patterns used correctly
- Code duplication (DRY)
- Modularity and cohesion
- Documentation quality
Diff: [DIFF]"

Agent 3 (Correctness):
"Review this diff for correctness. Focus on:
- Logic errors
- Edge cases not handled
- Test coverage gaps
- Error handling
Diff: [DIFF]"

Step 3: Orchestrate & Deduplicate

Synthesize findings from all agents:
[PASTE ALL AGENT OUTPUTS]

Tasks:
1. Remove duplicate findings
2. Rank by severity (Critical > High > Medium > Low)
3. Group by file
4. Generate summary table
5. Create final report with consensus issues only

Step 4: Generate Report

Output format:

## PR Review Summary

| File | Risk | Issues | Critical | High | Medium |
|------|------|--------|----------|------|--------|
| auth.py | High | 3 | 1 | 2 | 0 |
| api.py | Medium | 2 | 0 | 1 | 1 |

### Critical Issues (Block Merge)
1. **[auth.py:45]** SQL Injection vulnerability
   - Why: User input directly in query
   - Fix: Use parameterized queries

### High Issues (Require Fix)
...

### Consensus Score: 72/100
- Security: 65/100
- Performance: 80/100
- Maintainability: 70/100
- Correctness: 75/100
- Style: 85/100

Operation 3: LLM-as-Judge Tribunal

Time: 5-15 minutes Automation: 70% Purpose: High-confidence findings through consensus

Process:

Run Code Through Multiple Models:

Claude Analysis:

Analyze this code for issues. Rate severity 1-10 for each:
[CODE]

Gemini Analysis (via CLI):

gemini -p "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"

Codex Analysis (via CLI):

codex "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"

Calculate Consensus:

Given these analyses from 3 AI models:

Claude: [FINDINGS]
Gemini: [FINDINGS]
Codex: [FINDINGS]

Identify issues where at least 2 models agree:
1. List consensus findings
2. Average severity scores
3. Note any disagreements
4. Final verdict for each issue

Output: High-confidence issue list (≥67% agreement)

Operation 4: Mentorship Review

Time: 15-30 minutes Automation: 40% Purpose: Educational code review for learning

Process:

Review this code in mentorship mode. For a developer learning [LANGUAGE/FRAMEWORK]:

Code: [CODE]

For each finding:
1. **What's the issue** (be encouraging, not critical)
2. **Why it matters** (explain the underlying concept)
3. **How to improve** (show before/after with explanation)
4. **Learn more** (link to relevant documentation)

Also highlight:
- What was done well
- Good patterns to continue using
- Growth opportunities

Tone: Supportive and educational, never condescending.

Operation 5: Pre-Release Audit

Time: 30-60 minutes Automation: 50% Purpose: Comprehensive review before production

Process:

Full Codebase Scan:

# Identify all changes since last release
git diff v1.0.0...HEAD --stat
git log v1.0.0...HEAD --oneline

Security Deep Dive:

Run all security checks
Verify no new vulnerabilities
Check dependency updates
Audit secrets management

Performance Review:

Identify potential bottlenecks
Review database queries
Check for N+1 problems
Validate caching strategies

Test Coverage:

Verify test coverage targets
Check critical path coverage
Validate edge case tests

Generate Release Report:

## Pre-Release Audit: v1.1.0

### Security Clearance: PASS ✓
- No critical vulnerabilities
- All high issues resolved
- Secrets audit: Clean

### Performance Assessment: PASS ✓
- No new N+1 queries
- Response time within SLA
- Memory usage stable

### Test Coverage: 82% (target: 80%)
- Critical paths: 95%
- Edge cases: 78%

### Release Recommendation: APPROVED

Multi-AI Coordination

Agent Assignment Strategy

Task	Primary	Verification	Speed
Security scan	Claude	Gemini	Fast
Architecture review	Claude	Codex	Medium
Logic validation	Codex	Claude	Medium
Style checking	Gemini	Claude	Fast
Performance analysis	Claude	Codex	Medium

Coordination Commands

Launch Multi-Agent Review:

# Using Task tool for parallel execution
# Each agent reviews independently, orchestrator synthesizes

Gemini Quick Check:

gemini -p "Quick security scan of this code: [CODE]"

Codex Deep Analysis:

codex "Analyze this code architecture and suggest improvements: [CODE]"

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/ai-review.yml
name: Multi-AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR Diff
        run: |
          git diff origin/main...HEAD > pr_diff.txt

      - name: Claude Review
        uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          model: "claude-sonnet-4-5-20250929"
          review_level: "detailed"

      - name: Post Summary
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## AI Review Summary\n${process.env.REVIEW_SUMMARY}`
            })

Quality Gate Configuration

# Block merge for critical issues
quality_gates:
  critical_issues: 0      # Must be zero
  high_issues: 3          # Max allowed
  coverage_minimum: 80    # Percent
  score_minimum: 70       # Out of 100

Quality Scoring

Scoring Formula

Overall = (Security × 0.25) + (Performance × 0.20) +
          (Maintainability × 0.25) + (Correctness × 0.20) +
          (Style × 0.10)

Grade Mapping

Score	Grade	Status
≥90	A	Excellent - Ship it
80-89	B	Good - Minor fixes
70-79	C	Acceptable - Address issues
60-69	D	Needs work - Significant fixes
<60	F	Failing - Major revision needed

Anti-Patterns to Detect

Hardcoded Secrets - API keys, passwords in code
SQL Injection - Unparameterized queries
XSS Vulnerabilities - Unsanitized output
Race Conditions - Unprotected shared state
Memory Leaks - Unclosed resources
N+1 Queries - Loop database calls
Dead Code - Unreachable branches
God Objects - Classes doing too much
Copy-Paste Code - Duplicated logic
Missing Error Handling - Unhandled exceptions

Example Review Session

User: Review this PR for my authentication module

Claude: I'll perform a comprehensive multi-dimensional review.

[Launches parallel agents for security, maintainability, correctness]

## PR Review: Authentication Module

### Critical Issues (1)
1. **[auth.py:67]** Password stored in plaintext
   - Severity: Critical
   - Consensus: 3/3 models agree
   - Fix: Use bcrypt hashing
   ```python
   # Before
   user.password = request.password

   # After
   import bcrypt
   user.password = bcrypt.hashpw(request.password.encode(), bcrypt.gensalt())

High Issues (2)

[auth.py:45] No rate limiting on login endpoint
[auth.py:89] JWT secret hardcoded

Quality Score: 58/100 (Grade: F)

Security: 35/100 (Critical issues)
Performance: 70/100
Maintainability: 65/100
Correctness: 60/100
Style: 80/100

Recommendation: BLOCK MERGE

Resolve critical security issues before merging.


---

## Related Skills

- **multi-ai-testing**: Generate tests for reviewed code
- **multi-ai-verification**: Validate fixes
- **multi-ai-implementation**: Implement suggested fixes
- **codex-review**: Codex-specific review patterns
- **review-multi**: Skill-specific reviews

---

## References

- `references/security-checklist.md` - OWASP Top 10 checklist
- `references/performance-patterns.md` - Performance anti-patterns
- `references/ci-cd-integration.md` - Full CI/CD setup guide