python-refactor

Python Refactor

Purpose

Transform complex, hard-to-understand Python code into clear, well-documented, maintainable code while preserving correctness. This skill guides systematic refactoring that prioritizes human comprehension without sacrificing correctness or reasonable performance.

When to Invoke

Invoke this skill when:

User explicitly requests "human", "readable", "maintainable", "clean", or "refactor" code improvements
Code review processes flag comprehension or maintainability issues
Working with legacy code that needs modernization
Preparing code for team onboarding or educational contexts
Code complexity metrics exceed reasonable thresholds
Functions or modules are difficult to understand or modify
RED FLAG indicators: file >500 lines with scattered functions and global state, multiple global statements, no clear module/class organization, configuration mixed with business logic

Do NOT invoke this skill when:

Code is performance-critical and profiling shows optimization is needed first
Code is scheduled for deletion or replacement
External dependencies require upstream contributions instead
User explicitly requests performance optimization over readability

Core Principles

Follow these principles in priority order:

Prefer structured OOP for complex code - Code with shared state, multiple concerns, or scattered global functions should be restructured into well-organized classes and modules. Script-like code with global state and tangled dependencies benefits most from OOP. However, simple modules with pure functions, CLI tools using click/argparse, and functional data pipelines don't need to be forced into classes.
Clarity over cleverness - Explicit, obvious code beats implicit, clever code
Preserve correctness - All tests must pass; behavior must remain identical
Single Responsibility - Each class and function should do one thing well (SOLID principles)
Self-documenting structure - Code structure tells what, comments explain why
Progressive disclosure - Reveal complexity in layers, not all at once
Reasonable performance - Never sacrifice >2x performance without explicit approval

Key Constraints

ALWAYS observe these constraints:

SAFETY BY DESIGN - Use mandatory migration checklists for destructive changes. Create new structure, search all usages, migrate all, verify, only then remove old code. NEVER remove code before 100% migration verified.
STATIC ANALYSIS FIRST - Run flake8 --select=F821,E0602 before tests to catch NameErrors immediately
PRESERVE BEHAVIOR - All existing tests must pass after refactoring
NO PERFORMANCE REGRESSION - Never degrade performance >2x without explicit user approval
NO API CHANGES - Public APIs remain unchanged unless explicitly requested and documented
NO OVER-ENGINEERING - Simple code stays simple; don't add unnecessary abstraction
NO MAGIC - No framework magic, no metaprogramming unless absolutely necessary
VALIDATE CONTINUOUSLY - Run static analysis + tests after each logical change

Regression Prevention (MANDATORY)

Refactoring must NEVER introduce technical, logical, or functional regressions.

Read and apply references/REGRESSION_PREVENTION.md before any refactoring session.

Before each refactoring session:

Test suite passes at 100%
Coverage >= 80% on target code (if not, write tests FIRST)
Golden outputs captured for critical edge cases
Static analysis baseline saved

After each micro-change (not at the end, EVERY SINGLE ONE):

flake8 --select=F821,E999 -> 0 errors
pytest -x -> all passing
Spot check 1 edge case for unchanged behavior

If ANY check fails: STOP -> REVERT -> ANALYZE -> FIX APPROACH -> RETRY

ANY REGRESSION = TOTAL FAILURE OF THE REFACTORING

Refactoring Workflow

Execute refactoring in four phases with validation at each step.

Phase 1: Analysis

Before making any changes, analyze the code comprehensively:

Read the entire codebase section being refactored to understand context
Identify readability issues using the anti-patterns reference (see references/anti-patterns.md ):
Check for script-like/procedural code (global state, scattered functions, no clear structure)
Check for God Objects/Classes (classes doing too much)
Complex nested conditionals, long functions, magic numbers, cryptic names, etc.
Assess architecture (see references/oop_principles.md ):
Is code organized in proper classes and modules?
Is there global state that should be encapsulated?
Are responsibilities properly separated?
Are SOLID principles followed?
Is dependency injection used instead of hard-coded dependencies?
Measure current metrics using scripts/measure_complexity.py or scripts/analyze_multi_metrics.py
Run linting analysis (see Tooling Recommendations below for which tool to use)
Check test coverage - Identify gaps that need filling before refactoring
Document findings using the analysis template (see assets/templates/analysis_template.md )

Output: Prioritized list of issues by impact and risk.

Phase 2: Planning

Plan the refactoring approach systematically with safety-by-design:

Identify changes by type:

Non-destructive: Renames, documentation, type hints -> Low risk
Destructive: Removing globals, deleting functions, replacing APIs -> High risk

For DESTRUCTIVE changes - CREATE MIGRATION PLAN (MANDATORY):

Search for ALL usages of each element to be removed
Document every found usage with file, line number, and usage type
If you cannot create a complete migration plan, you CANNOT proceed with the destructive change

Risk assessment for each proposed change (Low/Medium/High)

Dependency identification - What else depends on this code?

Test strategy - What tests are needed? What might break?

Change ordering - Sequence changes from safest to riskiest

Expected outcomes - Document what metrics should improve and by how much

Output: Refactoring plan with sequenced changes, migration plans for destructive changes, test strategy, and rollback plan.

Phase 3: Execution

Apply refactoring patterns using safety-by-design workflow.

For NON-DESTRUCTIVE changes (safe to do anytime):

Rename variables/functions for clarity
Extract magic numbers/strings to named constants
Add/improve documentation and type hints
Add guard clauses to reduce nesting

For DESTRUCTIVE changes (removing/replacing code) - STRICT PROTOCOL:

CREATE new structure (no removal yet) - write new classes/functions, add tests
SEARCH comprehensively for ALL usages of the element being removed
CREATE migration checklist documenting every found usage
MIGRATE one usage at a time, checking off the list, running static analysis + tests after each
VERIFY complete migration - re-run original searches, should find zero old references
REMOVE old code only after 100% migration verified

Execution Rules

NEVER skip the migration checklist for destructive changes
Run static analysis BEFORE tests - Catch NameErrors immediately
One pattern at a time - Never mix multiple refactoring patterns in one change
Atomic commits - Each migration step gets its own commit
Stop on ANY error - Static analysis errors OR test failures require immediate fix/revert

Refactoring order (recommended sequence):

Transform script-like code to proper architecture (if code has global state and scattered functions). See references/examples/script_to_oop_transformation.md
Rename variables/functions for clarity
Extract magic numbers/strings to named constants (as class constants or enums)
Add/improve documentation and type hints
Extract methods to reduce function length
Simplify conditionals with guard clauses
Reduce nesting depth
Final review: Ensure separation of concerns is clean

Output: Refactored code passing all tests with clear commit history.

Phase 4: Validation

Validate improvements objectively:

Run static analysis FIRST (catch errors before tests):

flake8 <file> --select=F821,E0602 # Undefined names/variables flake8 <file> --select=F401 # Unused imports flake8 <file> # Full quality check

MANDATORY: Zero F821 and E0602 errors required

Run full test suite - 100% pass rate required

Validate architecture improvements:

Confirm global state has been eliminated or properly encapsulated
Verify code is organized in proper modules/classes
Check that responsibilities are properly separated
Validate against SOLID principles (see references/oop_principles.md )

Compare before/after metrics using scripts/measure_complexity.py or scripts/analyze_multi_metrics.py

Performance regression check - Run scripts/benchmark_changes.py for hot paths

Generate summary report using format from assets/templates/summary_template.md

Flag for human review if:

Performance degraded >10%
Public API signatures changed
Test coverage decreased
Significant architectural changes were made

Output: Comprehensive validation report with test results, metrics comparison, performance benchmarks, and quality summary.

Refactoring Patterns

Apply these patterns systematically. See references/patterns.md for full catalog with examples.

Key Patterns (summary)

Guard Clauses - Replace nested conditionals with early returns. See references/patterns.md
Extract Method - Split large functions into focused units. Resets nesting counter (most powerful for cognitive complexity)
Dictionary Dispatch - Eliminate if-elif chains with lookup tables
Match Statement (Python 3.10+) - switch counts as +1 total, not per branch
Named Boolean Conditions - Extract complex boolean expressions into named variables
Encapsulate Global State - Move globals into classes with proper encapsulation
Group Related Functions - Organize scattered functions into classes by responsibility
Create Domain Models - Replace primitive dicts with dataclasses and enums
Apply Dependency Injection - Replace hard-coded dependencies with injected ones

See references/cognitive_complexity_guide.md for cognitive complexity calculation rules and reduction patterns.

Naming Conventions

Variables: Descriptive names, booleans as is_active /has_permission /can_edit , collections as plurals
Functions: Verb + object (calculate_total , validate_email ), boolean queries as is_valid() /has_items()
Constants: UPPERCASE_WITH_UNDERSCORES , replace magic numbers/strings
Classes: PascalCase nouns (UserAccount , PaymentProcessor )

Documentation Patterns

Function Docstrings - Document purpose, args, returns, raises (Google style preferred)
Module Documentation - Purpose and key dependencies
Inline Comments - Only for non-obvious "why"
Type Hints - All public APIs and complex internals

OOP Transformation Patterns

For transforming script-like code to structured OOP. See references/examples/script_to_oop_transformation.md for a complete guide and references/oop_principles.md for SOLID principles.

Anti-Patterns to Fix

See references/anti-patterns.md for the full catalog. Priority order:

Critical: Script-like/procedural code with global state, God Object/God Class High: Complex nested conditionals (>3 levels), long functions (>30 lines), magic numbers, cryptic names, missing type hints, missing docstrings Medium: Duplicate code, primitive obsession, long parameter lists (>5) Low: Inconsistent naming, redundant comments, unused imports

Tooling Recommendations

Primary Stack: Ruff + Complexipy (recommended for new projects)

pip install ruff complexipy radon wily

ruff check src/ # Fast linting (Rust, replaces flake8+plugins) complexipy src/ --max-complexity-allowed 15 # Cognitive complexity (Rust) radon mi src/ -s # Maintainability Index

See references/cognitive_complexity_guide.md for complete configuration (pyproject.toml, pre-commit hooks, GitHub Actions, CLI usage).

Alternative: Flake8 (for projects already using it)

The scripts/analyze_with_flake8.py and scripts/compare_flake8_reports.py scripts use flake8. See references/flake8_plugins_guide.md for the curated plugin list.

Multi-Metric Analysis

Use scripts/analyze_multi_metrics.py to combine cognitive complexity (complexipy), cyclomatic complexity (radon), and maintainability index in a single report.

Metric Tool Use

Cognitive Complexity complexipy Human comprehension

Cyclomatic Complexity ruff (C901), radon Test planning

Maintainability Index radon Overall code health

Metric Targets

Cyclomatic complexity: <10 per function (warning at 15, error at 20)
Cognitive complexity: <15 per function (SonarQube default, warning at 20)
Function length: <30 lines (warning at 50)
Nesting depth: <=3 levels
Docstring coverage: >80% for public functions
Type hint coverage: >90% for public APIs

Historical Tracking with Wily

Monitor trends over time, not just thresholds. See references/cognitive_complexity_guide.md for setup and CI integration.

Common Refactoring Mistakes

See references/REGRESSION_PREVENTION.md for the full guide. Key traps:

Incomplete Migration - Removing old code before ALL usages are migrated (causes NameErrors)
Partial Pattern Application - Applying refactoring to some functions but not others
Breaking Public APIs - Changing function signatures used by external code
Assuming Tests Cover Everything - Tests pass but runtime errors occur (run static analysis!)

Output Format

Structure refactoring output using the template from assets/templates/summary_template.md . Include:

Changes made with rationale and risk level
Before/after metrics comparison table
Test results and performance impact
Risk assessment and human review recommendation

Related tools -- when to use what

humanize (agent, humanize plugin) -- Multi-language cosmetic cleanup. Renames local variables, improves comments, simplifies structure. Lowest regression risk. Use for: "make this readable", "clean up naming".
python-refactor (this skill) -- Python-only deep restructuring. OOP transformation, SOLID principles, complexity metrics, migration checklists, benchmark validation. Use for: "refactor this module", "reduce complexity", "transform to OOP".

Escalation path: humanize -> python-refactor (from safest to most thorough).

Integration with Same-Package Skills

python-tdd - Set up tests before refactoring, validate coverage after
python-performance-optimization - Deep profiling before/after refactoring
python-packaging - If refactoring a library, handle pyproject.toml and distribution
uv-package-manager - Use uv run ruff , uv run complexipy for tool execution
async-python-patterns - Reference async patterns when refactoring async code

Edge Cases and Limitations

When NOT to Refactor: Performance-critical optimized code (profile first), code scheduled for deletion, external dependencies (contribute upstream), stable legacy code nobody needs to modify.

Limitations: Cannot improve algorithmic complexity (that's algorithm change, not refactoring). Cannot add domain knowledge not in code/comments. Cannot guarantee correctness without tests. Code style preferences vary - adjust based on team conventions.

Examples

See references/examples/ for before/after examples:

script_to_oop_transformation.md
Complete transformation from script-like code to clean OOP architecture
python_complexity_reduction.md
Nested conditionals and long functions
typescript_naming_improvements.md
Variable and function naming patterns (cross-language reference)

Success Criteria

Refactoring is successful when:

ZERO regressions - All existing tests pass, behavior unchanged
Golden master match - Identical output for documented critical cases
Complexity metrics improved (documented in summary)
No performance regression >10% (or explicit approval obtained)
Documentation coverage improved
Code is easier for humans to understand
No new security vulnerabilities introduced
Changes are atomic and well-documented in git history
Wily trend - Complexity not increased compared to previous commit
Static analysis shows improvement

python-refactor

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

async-python-patterns

python-packaging

python-comments