Python Refactor
Purpose
Transform complex, hard-to-understand Python code into clear, well-documented, maintainable code while preserving correctness. This skill guides systematic refactoring that prioritizes human comprehension without sacrificing correctness or reasonable performance.
When to Invoke
Invoke this skill when:
-
User explicitly requests "human", "readable", "maintainable", "clean", or "refactor" code improvements
-
Code review processes flag comprehension or maintainability issues
-
Working with legacy code that needs modernization
-
Preparing code for team onboarding or educational contexts
-
Code complexity metrics exceed reasonable thresholds
-
Functions or modules are difficult to understand or modify
-
RED FLAG indicators: file >500 lines with scattered functions and global state, multiple global statements, no clear module/class organization, configuration mixed with business logic
Do NOT invoke this skill when:
-
Code is performance-critical and profiling shows optimization is needed first
-
Code is scheduled for deletion or replacement
-
External dependencies require upstream contributions instead
-
User explicitly requests performance optimization over readability
Core Principles
Follow these principles in priority order:
-
Prefer structured OOP for complex code - Code with shared state, multiple concerns, or scattered global functions should be restructured into well-organized classes and modules. Script-like code with global state and tangled dependencies benefits most from OOP. However, simple modules with pure functions, CLI tools using click/argparse, and functional data pipelines don't need to be forced into classes.
-
Clarity over cleverness - Explicit, obvious code beats implicit, clever code
-
Preserve correctness - All tests must pass; behavior must remain identical
-
Single Responsibility - Each class and function should do one thing well (SOLID principles)
-
Self-documenting structure - Code structure tells what, comments explain why
-
Progressive disclosure - Reveal complexity in layers, not all at once
-
Reasonable performance - Never sacrifice >2x performance without explicit approval
Key Constraints
ALWAYS observe these constraints:
-
SAFETY BY DESIGN - Use mandatory migration checklists for destructive changes. Create new structure, search all usages, migrate all, verify, only then remove old code. NEVER remove code before 100% migration verified.
-
STATIC ANALYSIS FIRST - Run flake8 --select=F821,E0602 before tests to catch NameErrors immediately
-
PRESERVE BEHAVIOR - All existing tests must pass after refactoring
-
NO PERFORMANCE REGRESSION - Never degrade performance >2x without explicit user approval
-
NO API CHANGES - Public APIs remain unchanged unless explicitly requested and documented
-
NO OVER-ENGINEERING - Simple code stays simple; don't add unnecessary abstraction
-
NO MAGIC - No framework magic, no metaprogramming unless absolutely necessary
-
VALIDATE CONTINUOUSLY - Run static analysis + tests after each logical change
Regression Prevention (MANDATORY)
Refactoring must NEVER introduce technical, logical, or functional regressions.
Read and apply references/REGRESSION_PREVENTION.md before any refactoring session.
Before each refactoring session:
-
Test suite passes at 100%
-
Coverage >= 80% on target code (if not, write tests FIRST)
-
Golden outputs captured for critical edge cases
-
Static analysis baseline saved
After each micro-change (not at the end, EVERY SINGLE ONE):
-
flake8 --select=F821,E999 -> 0 errors
-
pytest -x -> all passing
-
Spot check 1 edge case for unchanged behavior
If ANY check fails: STOP -> REVERT -> ANALYZE -> FIX APPROACH -> RETRY
ANY REGRESSION = TOTAL FAILURE OF THE REFACTORING
Refactoring Workflow
Execute refactoring in four phases with validation at each step.
Phase 1: Analysis
Before making any changes, analyze the code comprehensively:
-
Read the entire codebase section being refactored to understand context
-
Identify readability issues using the anti-patterns reference (see references/anti-patterns.md ):
-
Check for script-like/procedural code (global state, scattered functions, no clear structure)
-
Check for God Objects/Classes (classes doing too much)
-
Complex nested conditionals, long functions, magic numbers, cryptic names, etc.
-
Assess architecture (see references/oop_principles.md ):
-
Is code organized in proper classes and modules?
-
Is there global state that should be encapsulated?
-
Are responsibilities properly separated?
-
Are SOLID principles followed?
-
Is dependency injection used instead of hard-coded dependencies?
-
Measure current metrics using scripts/measure_complexity.py or scripts/analyze_multi_metrics.py
-
Run linting analysis (see Tooling Recommendations below for which tool to use)
-
Check test coverage - Identify gaps that need filling before refactoring
-
Document findings using the analysis template (see assets/templates/analysis_template.md )
Output: Prioritized list of issues by impact and risk.
Phase 2: Planning
Plan the refactoring approach systematically with safety-by-design:
Identify changes by type:
-
Non-destructive: Renames, documentation, type hints -> Low risk
-
Destructive: Removing globals, deleting functions, replacing APIs -> High risk
For DESTRUCTIVE changes - CREATE MIGRATION PLAN (MANDATORY):
-
Search for ALL usages of each element to be removed
-
Document every found usage with file, line number, and usage type
-
If you cannot create a complete migration plan, you CANNOT proceed with the destructive change
Risk assessment for each proposed change (Low/Medium/High)
Dependency identification - What else depends on this code?
Test strategy - What tests are needed? What might break?
Change ordering - Sequence changes from safest to riskiest
Expected outcomes - Document what metrics should improve and by how much
Output: Refactoring plan with sequenced changes, migration plans for destructive changes, test strategy, and rollback plan.
Phase 3: Execution
Apply refactoring patterns using safety-by-design workflow.
For NON-DESTRUCTIVE changes (safe to do anytime):
-
Rename variables/functions for clarity
-
Extract magic numbers/strings to named constants
-
Add/improve documentation and type hints
-
Add guard clauses to reduce nesting
For DESTRUCTIVE changes (removing/replacing code) - STRICT PROTOCOL:
-
CREATE new structure (no removal yet) - write new classes/functions, add tests
-
SEARCH comprehensively for ALL usages of the element being removed
-
CREATE migration checklist documenting every found usage
-
MIGRATE one usage at a time, checking off the list, running static analysis + tests after each
-
VERIFY complete migration - re-run original searches, should find zero old references
-
REMOVE old code only after 100% migration verified
Execution Rules
-
NEVER skip the migration checklist for destructive changes
-
Run static analysis BEFORE tests - Catch NameErrors immediately
-
One pattern at a time - Never mix multiple refactoring patterns in one change
-
Atomic commits - Each migration step gets its own commit
-
Stop on ANY error - Static analysis errors OR test failures require immediate fix/revert
Refactoring order (recommended sequence):
-
Transform script-like code to proper architecture (if code has global state and scattered functions). See references/examples/script_to_oop_transformation.md
-
Rename variables/functions for clarity
-
Extract magic numbers/strings to named constants (as class constants or enums)
-
Add/improve documentation and type hints
-
Extract methods to reduce function length
-
Simplify conditionals with guard clauses
-
Reduce nesting depth
-
Final review: Ensure separation of concerns is clean
Output: Refactored code passing all tests with clear commit history.
Phase 4: Validation
Validate improvements objectively:
Run static analysis FIRST (catch errors before tests):
flake8 <file> --select=F821,E0602 # Undefined names/variables flake8 <file> --select=F401 # Unused imports flake8 <file> # Full quality check
MANDATORY: Zero F821 and E0602 errors required
Run full test suite - 100% pass rate required
Validate architecture improvements:
-
Confirm global state has been eliminated or properly encapsulated
-
Verify code is organized in proper modules/classes
-
Check that responsibilities are properly separated
-
Validate against SOLID principles (see references/oop_principles.md )
Compare before/after metrics using scripts/measure_complexity.py or scripts/analyze_multi_metrics.py
Performance regression check - Run scripts/benchmark_changes.py for hot paths
Generate summary report using format from assets/templates/summary_template.md
Flag for human review if:
-
Performance degraded >10%
-
Public API signatures changed
-
Test coverage decreased
-
Significant architectural changes were made
Output: Comprehensive validation report with test results, metrics comparison, performance benchmarks, and quality summary.
Refactoring Patterns
Apply these patterns systematically. See references/patterns.md for full catalog with examples.
Key Patterns (summary)
-
Guard Clauses - Replace nested conditionals with early returns. See references/patterns.md
-
Extract Method - Split large functions into focused units. Resets nesting counter (most powerful for cognitive complexity)
-
Dictionary Dispatch - Eliminate if-elif chains with lookup tables
-
Match Statement (Python 3.10+) - switch counts as +1 total, not per branch
-
Named Boolean Conditions - Extract complex boolean expressions into named variables
-
Encapsulate Global State - Move globals into classes with proper encapsulation
-
Group Related Functions - Organize scattered functions into classes by responsibility
-
Create Domain Models - Replace primitive dicts with dataclasses and enums
-
Apply Dependency Injection - Replace hard-coded dependencies with injected ones
See references/cognitive_complexity_guide.md for cognitive complexity calculation rules and reduction patterns.
Naming Conventions
-
Variables: Descriptive names, booleans as is_active /has_permission /can_edit , collections as plurals
-
Functions: Verb + object (calculate_total , validate_email ), boolean queries as is_valid() /has_items()
-
Constants: UPPERCASE_WITH_UNDERSCORES , replace magic numbers/strings
-
Classes: PascalCase nouns (UserAccount , PaymentProcessor )
Documentation Patterns
-
Function Docstrings - Document purpose, args, returns, raises (Google style preferred)
-
Module Documentation - Purpose and key dependencies
-
Inline Comments - Only for non-obvious "why"
-
Type Hints - All public APIs and complex internals
OOP Transformation Patterns
For transforming script-like code to structured OOP. See references/examples/script_to_oop_transformation.md for a complete guide and references/oop_principles.md for SOLID principles.
Anti-Patterns to Fix
See references/anti-patterns.md for the full catalog. Priority order:
Critical: Script-like/procedural code with global state, God Object/God Class High: Complex nested conditionals (>3 levels), long functions (>30 lines), magic numbers, cryptic names, missing type hints, missing docstrings Medium: Duplicate code, primitive obsession, long parameter lists (>5) Low: Inconsistent naming, redundant comments, unused imports
Tooling Recommendations
Primary Stack: Ruff + Complexipy (recommended for new projects)
pip install ruff complexipy radon wily
ruff check src/ # Fast linting (Rust, replaces flake8+plugins) complexipy src/ --max-complexity-allowed 15 # Cognitive complexity (Rust) radon mi src/ -s # Maintainability Index
See references/cognitive_complexity_guide.md for complete configuration (pyproject.toml, pre-commit hooks, GitHub Actions, CLI usage).
Alternative: Flake8 (for projects already using it)
The scripts/analyze_with_flake8.py and scripts/compare_flake8_reports.py scripts use flake8. See references/flake8_plugins_guide.md for the curated plugin list.
Multi-Metric Analysis
Use scripts/analyze_multi_metrics.py to combine cognitive complexity (complexipy), cyclomatic complexity (radon), and maintainability index in a single report.
Metric Tool Use
Cognitive Complexity complexipy Human comprehension
Cyclomatic Complexity ruff (C901), radon Test planning
Maintainability Index radon Overall code health
Metric Targets
-
Cyclomatic complexity: <10 per function (warning at 15, error at 20)
-
Cognitive complexity: <15 per function (SonarQube default, warning at 20)
-
Function length: <30 lines (warning at 50)
-
Nesting depth: <=3 levels
-
Docstring coverage: >80% for public functions
-
Type hint coverage: >90% for public APIs
Historical Tracking with Wily
Monitor trends over time, not just thresholds. See references/cognitive_complexity_guide.md for setup and CI integration.
Common Refactoring Mistakes
See references/REGRESSION_PREVENTION.md for the full guide. Key traps:
-
Incomplete Migration - Removing old code before ALL usages are migrated (causes NameErrors)
-
Partial Pattern Application - Applying refactoring to some functions but not others
-
Breaking Public APIs - Changing function signatures used by external code
-
Assuming Tests Cover Everything - Tests pass but runtime errors occur (run static analysis!)
Output Format
Structure refactoring output using the template from assets/templates/summary_template.md . Include:
-
Changes made with rationale and risk level
-
Before/after metrics comparison table
-
Test results and performance impact
-
Risk assessment and human review recommendation
Related tools -- when to use what
-
humanize (agent, humanize plugin) -- Multi-language cosmetic cleanup. Renames local variables, improves comments, simplifies structure. Lowest regression risk. Use for: "make this readable", "clean up naming".
-
python-refactor (this skill) -- Python-only deep restructuring. OOP transformation, SOLID principles, complexity metrics, migration checklists, benchmark validation. Use for: "refactor this module", "reduce complexity", "transform to OOP".
Escalation path: humanize -> python-refactor (from safest to most thorough).
Integration with Same-Package Skills
-
python-tdd - Set up tests before refactoring, validate coverage after
-
python-performance-optimization - Deep profiling before/after refactoring
-
python-packaging - If refactoring a library, handle pyproject.toml and distribution
-
uv-package-manager - Use uv run ruff , uv run complexipy for tool execution
-
async-python-patterns - Reference async patterns when refactoring async code
Edge Cases and Limitations
When NOT to Refactor: Performance-critical optimized code (profile first), code scheduled for deletion, external dependencies (contribute upstream), stable legacy code nobody needs to modify.
Limitations: Cannot improve algorithmic complexity (that's algorithm change, not refactoring). Cannot add domain knowledge not in code/comments. Cannot guarantee correctness without tests. Code style preferences vary - adjust based on team conventions.
Examples
See references/examples/ for before/after examples:
-
script_to_oop_transformation.md
-
Complete transformation from script-like code to clean OOP architecture
-
python_complexity_reduction.md
-
Nested conditionals and long functions
-
typescript_naming_improvements.md
-
Variable and function naming patterns (cross-language reference)
Success Criteria
Refactoring is successful when:
-
ZERO regressions - All existing tests pass, behavior unchanged
-
Golden master match - Identical output for documented critical cases
-
Complexity metrics improved (documented in summary)
-
No performance regression >10% (or explicit approval obtained)
-
Documentation coverage improved
-
Code is easier for humans to understand
-
No new security vulnerabilities introduced
-
Changes are atomic and well-documented in git history
-
Wily trend - Complexity not increased compared to previous commit
-
Static analysis shows improvement