Portfolio Optimization
Overview
This skill provides guidance for implementing high-performance portfolio optimization algorithms using Python C extensions. It covers the workflow for creating C extensions that interface with NumPy arrays, proper verification strategies, and common pitfalls to avoid when optimizing numerical computations.
When to Apply This Skill
Apply this skill when:
-
Implementing portfolio risk calculations (variance, volatility, Sharpe ratio)
-
Optimizing matrix-vector operations for large asset portfolios
-
Creating C extensions for Python numerical code
-
Performance requirements specify speedup ratios (e.g., >= 1.2x)
-
Working with covariance matrices and portfolio weights
Recommended Workflow
Phase 1: Codebase Understanding
Before writing any code:
-
Read all relevant source files completely - Understand the baseline implementation, data structures, and expected interfaces
-
Identify the mathematical operations - Common operations include:
-
Matrix-vector multiplication (covariance matrix times weights)
-
Dot products (weights times returns)
-
Square root operations (for volatility from variance)
-
Understand the test suite - Know what correctness tolerances are expected (e.g., 1e-10) and what performance benchmarks must be met
-
Document the input/output contracts - Array shapes, data types (typically float64), and return value specifications
Phase 2: Implementation Planning
Consider these factors before implementation:
Why C provides speedup:
-
Eliminates Python interpreter overhead
-
Enables direct memory access without bounds checking
-
Allows compiler optimizations (vectorization, loop unrolling)
-
Reduces temporary array allocations
Design decisions to make:
-
Whether to use NumPy C API for zero-copy array access
-
Memory layout assumptions (C-contiguous vs Fortran-contiguous)
-
Error handling strategy for type mismatches and dimension errors
Potential algorithmic optimizations:
-
Cache-friendly memory access patterns (row-major iteration for C arrays)
-
SIMD vectorization opportunities
-
Minimizing Python-to-C data conversion overhead
Phase 3: C Extension Implementation
When implementing the C extension:
Include proper headers:
-
Python.h (must be first)
-
numpy/arrayobject.h for NumPy array access
Initialize NumPy in the module init function:
-
Call import_array() to initialize NumPy C API
Use NumPy C API for array access:
-
PyArray_DATA() for getting data pointer
-
PyArray_DIM() for dimensions
-
PyArray_STRIDE() for memory strides
-
Check PyArray_IS_C_CONTIGUOUS() for memory layout
Implement robust error handling:
-
Validate array dimensions match expected shapes
-
Check data types (expect NPY_FLOAT64 for double precision)
-
Handle non-contiguous arrays (either reject or handle strides)
-
Set appropriate Python exceptions on error
Phase 4: Python Wrapper Implementation
Create a Python module that:
-
Imports the C extension module
-
Provides a clean interface matching the baseline API
-
Handles any necessary array preparation (ensuring contiguity)
-
Documents the interface clearly
Phase 5: Verification Strategy
Critical: Verify every change completely
After editing files, re-read them - Confirm edits were applied correctly, especially for multi-line changes
Test incrementally:
-
Build the C extension first and verify it compiles
-
Test individual functions before running full benchmarks
-
Use small test cases for correctness verification before scaling up
Correctness verification:
-
Compare outputs against baseline implementation
-
Use appropriate numerical tolerances (typically 1e-10 for double precision)
-
Test with known inputs where expected outputs can be calculated manually
Performance verification:
-
Run benchmarks with representative data sizes
-
Verify speedup meets requirements across different portfolio sizes
-
Test edge cases: small portfolios (n=1, n=10), large portfolios (n=5000+)
Edge Cases to Handle
Ensure the implementation addresses:
-
Empty portfolios (n=0) - Return appropriate default or error
-
Single-asset portfolios (n=1) - Degenerate case for covariance
-
Dimension mismatches - Weights vector length vs covariance matrix dimensions
-
Invalid inputs:
-
Non-square covariance matrices
-
NaN or infinity values in inputs
-
Negative variance (mathematically invalid)
-
Memory considerations:
-
Non-contiguous NumPy arrays
-
Memory allocation failures in C code
-
Large portfolios that may stress memory
Common Pitfalls to Avoid
Code Completeness
-
Never truncate code in edit operations - always provide complete implementations
-
Verify file contents after editing to confirm changes applied correctly
-
Document all design choices explicitly
Testing Approach
-
Avoid going directly from implementation to full benchmark testing
-
Test each function individually before integration testing
-
Do not rely solely on "tests pass" for validation - understand why they pass
C Extension Specific
-
Always check NumPy array types before accessing data
-
Handle reference counting properly to avoid memory leaks
-
Initialize NumPy API with import_array() in module init
-
Use PyErr_SetString() to set exceptions on errors
Performance Validation
-
Verify speedup is consistent across different input sizes
-
Profile if further optimizations might be needed
-
Consider the overhead of Python-to-C transitions for small inputs
Build and Test Commands
Typical workflow commands:
Build the C extension
python setup.py build_ext --inplace
Run correctness tests
python -c "from portfolio_optimized import *; # test calls"
Run benchmark
python benchmark.py
Run full test suite
pytest test_portfolio.py -v
Verification Checklist
Before considering the task complete:
-
All source files read and understood
-
C extension compiles without warnings
-
Individual functions tested for correctness
-
Numerical results match baseline within tolerance
-
Performance meets speedup requirements
-
Edge cases explicitly tested or handled
-
Error handling implemented for invalid inputs
-
File contents verified after all edits
-
No memory leaks in C code (proper reference counting)