Code Review Best Practices

Core Philosophy

Review at PR level, not file-by-file. Focus on egregious structural issues, not minutia. Look for cross-file patterns that indicate architectural problems.

Quick Start

ALWAYS filter out lock files first (often 10k+ lines of noise)

git diff main...HEAD -- . ':!uv.lock' ':!poetry.lock' > /tmp/pr-diff.txt wc -lc /tmp/pr-diff.txt # Most diffs fit in 256KB after this

Check size - if under 256KB, analyze the whole thing

If still too big, filter more

git diff main...HEAD -- . ':!uv.lock' ':!docs/*' > /tmp/code-diff.txt

What to Look For - Egregious Issues Only

Quick Checklist

Code duplication - Same logic in 2+ places?
Repeated patterns - Same code structure 3+ times?
God functions - Functions over 100 lines?
Weak types - Any, object, raw dict/list, missing annotations?
Type-deficient patterns - hasattr, getattr instead of proper types?
Meaningless tests - Tests that just verify assignment?
Dead code - Unused functions, commented code?

What NOT to Flag

Variable naming (unless truly confusing)
Line length
Comment style
Whitespace
Import order

These are auto-fixable or minor. Focus on structural problems.

Critical Rules Quick Reference

Flag: Weak types

def process(data: Any) -> dict: # Any, raw dict def process(data: UserData) -> UserResponse: # Specific types

Flag: hasattr/getattr (type deficiency marker)

if hasattr(obj, "name"): # Why don't you know the type? if isinstance(obj, Named): # Use protocol or union type

Flag: God functions

def process_order(order_data): # 200 lines doing everything validated = _validate(data) # Extract responsibilities transformed = _transform(validated)

Flag: Duplicate logic across files

service_a.py and service_b.py have same 20-line validation

Extract to shared utility

Flag: Related classes without shared interface

class CacheDirectory: def list_entries(self): ... # Names differ class StatusDirectory: def get_all_entries(self): ... # Should share ABC

Handling Large Diffs

Keep maximum context to spot cross-file patterns.

Step 1: Remove Lock Files (Always)

git diff main...HEAD -- . ':!uv.lock' > /tmp/diff.txt wc -lc /tmp/diff.txt # Under 256KB? STOP - analyze whole thing

Step 2: Remove Docs (If Still Too Big)

git diff main...HEAD -- . ':!uv.lock' ':!docs/*' > /tmp/code-diff.txt

Step 3: Remove Tests (Last Resort)

git diff main...HEAD -- . ':!uv.lock' ':!docs/' ':!tests/' > /tmp/prod.txt

Step 4: Chunk with LARGE Chunks (Rare)

Only if >256KB after all filtering

Use 15k lines to preserve cross-file patterns

Small chunks (2k lines) defeat the purpose!

LLM-Based Full Diff Review

Why use external LLM tools? Claude Code has a 25K context limit per tool call. For reviewing entire PR diffs (50K-200K+ tokens), use models with larger context windows.

Benefits:

Cross-file pattern detection in one pass
Different models catch different issues
Faster than file-by-file review

Workflow

1. Extract changes (handles lock file exclusion automatically)

./scripts/extract-changes.sh # PR: current branch vs main ./scripts/extract-changes.sh abc123 # Specific commit ./scripts/extract-changes.sh auth/login # Path pattern (fuzzy match)

2. Run code review with large-context model

./scripts/llm-review.sh -m gpt-4o ./scripts/llm-review.sh -m claude-3-5-sonnet-latest

3. Run specialized reviews

./scripts/llm-review-tests.sh -m gpt-4o # Test quality ./scripts/llm-review-types.sh -m gpt-4o # Type hints

Setup

Requires Simon Willison's llm tool:

pip install llm llm keys set openai # For GPT-4 pip install llm-claude-3 # For Claude llm keys set claude

See references/llm-tooling.md for full setup and usage guide.

Using LLM Findings

LLM review provides hints, not final answers:

Identify areas to investigate deeper
Cross-check with different models
Use Claude Code to implement actual fixes

Reference Files

For detailed patterns and examples:

references/what-to-flag.md - Duplication, weak types, god functions
references/code-bloat-patterns.md - AI-generated bloat patterns
references/llm-tooling.md - LLM tool setup and usage for full-diff review

Remember: The ENTIRE POINT of full diff review is cross-file patterns. Don't chunk unless you absolutely must!

Human-in-the-Loop After Code Review

CRITICAL: After completing a code review, ALWAYS:

Present findings - Output the full review report with all sections
List suggested actions - Number each potential fix
Ask for approval - Use AskUserQuestion before creating TODOs or executing
Wait for explicit approval - User may reject, ask for clarification, or approve selectively

Why this matters:

LLM reviews often suggest unnecessary changes
Some suggestions may be wrong or over-engineered
User knows context that LLM doesn't
Executing without approval wastes time on rejected work

Example flow:

[Code review findings output]

Suggested Actions

Add format-duration validator to Schema
Add tests for format-duration validation
Move constant to config class

Which items should I proceed with?

code-review

Safety Notice

Copy this and send it to your AI assistant to learn