XLSX Processing Skill
Overview
Comprehensive Excel manipulation using pandas for data analysis and openpyxl for formulas, formatting, and Excel-specific features.
Quick Start
import pandas as pd from openpyxl import Workbook
Read with pandas
df = pd.read_excel("data.xlsx") print(df.head())
Create with openpyxl
wb = Workbook() ws = wb.active ws["A1"] = "Hello" ws["B1"] = "World" wb.save("output.xlsx")
When to Use
-
Reading and analyzing Excel data with pandas
-
Creating formatted spreadsheets programmatically
-
Building financial models with formulas
-
Generating reports with charts and graphs
-
Automating data entry and updates
-
Converting between Excel and other formats
-
Batch processing multiple spreadsheets
-
Creating templates for repeated use
Requirements for All Output
-
Zero formula errors mandatory
-
Preserve existing templates when updating
-
Always use Excel formulas instead of calculating values in Python and hardcoding them
-
Follow industry-standard color coding for financial models
Financial Model Color Standards
-
Blue: Input cells (hardcoded values)
-
Black: Formula cells (calculated values)
-
Green: Links to other worksheets
-
Red: Links to external files
Number Formatting Rules
-
Years displayed as text (no commas)
-
Currency with appropriate units (K, M, B)
-
Zeros displayed as "-"
Reading Excel Files
With Pandas
import pandas as pd
Read entire file
df = pd.read_excel("data.xlsx")
Read specific sheet
df = pd.read_excel("data.xlsx", sheet_name="Sheet2")
Read with options
df = pd.read_excel( "data.xlsx", sheet_name=0, header=0, usecols="A:D", skiprows=2 )
Read all sheets
dfs = pd.read_excel("data.xlsx", sheet_name=None) for sheet_name, df in dfs.items(): print(f"Sheet: {sheet_name}") print(df.head())
With Openpyxl
from openpyxl import load_workbook
wb = load_workbook("data.xlsx") ws = wb.active
Read cell values
for row in ws.iter_rows(min_row=1, max_row=10, values_only=True): print(row)
Read specific cell
value = ws["A1"].value
Creating Excel Files
With Pandas
import pandas as pd
df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Sales': [1000, 1500, 1200], 'Region': ['East', 'West', 'North'] })
Simple export
df.to_excel("output.xlsx", index=False)
Multiple sheets
with pd.ExcelWriter("output.xlsx") as writer: df.to_excel(writer, sheet_name="Sales", index=False) df.to_excel(writer, sheet_name="Backup", index=False)
With Openpyxl (Formulas)
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment from openpyxl.utils import get_column_letter
wb = Workbook() ws = wb.active ws.title = "Financial Model"
Headers
headers = ["Item", "Q1", "Q2", "Q3", "Q4", "Total"] for col, header in enumerate(headers, 1): cell = ws.cell(row=1, column=col, value=header) cell.font = Font(bold=True) cell.alignment = Alignment(horizontal="center")
Data with formulas
data = [ ("Revenue", 1000, 1200, 1100, 1400), ("Expenses", 800, 900, 850, 1000), ]
for row, (item, *values) in enumerate(data, 2): ws.cell(row=row, column=1, value=item) for col, value in enumerate(values, 2): ws.cell(row=row, column=col, value=value) # Total formula ws.cell(row=row, column=6, value=f"=SUM(B{row}:E{row})")
Profit row with formula
ws.cell(row=4, column=1, value="Profit") for col in range(2, 7): col_letter = get_column_letter(col) ws.cell(row=4, column=col, value=f"={col_letter}2-{col_letter}3")
wb.save("financial_model.xlsx")
Formatting
Cell Styles
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Border, Side, Alignment from openpyxl.styles.numbers import FORMAT_CURRENCY_USD_SIMPLE
wb = Workbook() ws = wb.active
Font styling
ws["A1"] = "Styled Cell" ws["A1"].font = Font( name="Arial", size=14, bold=True, italic=False, color="2E74B5" )
Fill color
ws["A2"] = "Blue Background" ws["A2"].fill = PatternFill( start_color="0000FF", end_color="0000FF", fill_type="solid" )
Number format
ws["A3"] = 1234567.89 ws["A3"].number_format = FORMAT_CURRENCY_USD_SIMPLE
Alignment
ws["A4"] = "Centered" ws["A4"].alignment = Alignment( horizontal="center", vertical="center" )
Borders
thin_border = Border( left=Side(style="thin"), right=Side(style="thin"), top=Side(style="thin"), bottom=Side(style="thin") ) ws["A5"].border = thin_border
wb.save("formatted.xlsx")
Column Width and Row Height
from openpyxl import Workbook
wb = Workbook() ws = wb.active
Set column width
ws.column_dimensions["A"].width = 20 ws.column_dimensions["B"].width = 15
Set row height
ws.row_dimensions[1].height = 30
Auto-fit (approximate)
for col in ws.columns: max_length = 0 column = col[0].column_letter for cell in col: if cell.value: max_length = max(max_length, len(str(cell.value))) ws.column_dimensions[column].width = max_length + 2
wb.save("sized.xlsx")
Charts
from openpyxl import Workbook from openpyxl.chart import BarChart, Reference
wb = Workbook() ws = wb.active
Data
data = [ ["Month", "Sales"], ["Jan", 100], ["Feb", 120], ["Mar", 140], ["Apr", 110], ] for row in data: ws.append(row)
Create chart
chart = BarChart() chart.type = "col" chart.title = "Monthly Sales" chart.y_axis.title = "Sales ($)" chart.x_axis.title = "Month"
data = Reference(ws, min_col=2, min_row=1, max_row=5) cats = Reference(ws, min_col=1, min_row=2, max_row=5) chart.add_data(data, titles_from_data=True) chart.set_categories(cats) chart.shape = 4 # Rectangle
ws.add_chart(chart, "D2") wb.save("with_chart.xlsx")
Formula Verification
Check for Errors
from openpyxl import load_workbook
wb = load_workbook("model.xlsx", data_only=False) ws = wb.active
errors = ["#REF!", "#DIV/0!", "#VALUE!", "#N/A", "#NAME?", "#NULL!", "#NUM!"]
for row in ws.iter_rows(): for cell in row: if cell.value and isinstance(cell.value, str): for error in errors: if error in str(cell.value): print(f"Error {error} in cell {cell.coordinate}")
Validate Formulas
Before saving, always:
-
Check cell references are correct
-
Avoid off-by-one errors
-
Test edge cases (empty cells, zeros)
-
Verify formula logic
Execution Checklist
-
Choose appropriate tool (pandas vs openpyxl)
-
Verify input file exists and is valid
-
Test formulas with sample data
-
Apply consistent formatting
-
Validate all formulas produce correct results
-
Check for Excel errors (#REF!, #DIV/0!, etc.)
-
Verify charts display correctly
-
Test in Excel/LibreOffice before delivery
Error Handling
Common Errors
Error: InvalidFileException
-
Cause: File is not a valid .xlsx (possibly .xls)
-
Solution: Convert to .xlsx or use xlrd for .xls files
Error: Circular reference
-
Cause: Formula references itself
-
Solution: Review formula logic and break the cycle
Error: #REF! in formulas
-
Cause: Cell reference is invalid (deleted row/column)
-
Solution: Use named ranges or validate references
Error: Memory issues with large files
-
Cause: Loading entire file into memory
-
Solution: Use read_only=True or write_only=True mode
Metrics
Metric Typical Value
Read speed (pandas) ~50,000 rows/second
Write speed (pandas) ~30,000 rows/second
Formula cells ~10,000 cells/second
Chart creation ~5 charts/second
Memory usage ~100MB per 100K rows
Workflow
-
Choose appropriate tool (pandas or openpyxl)
-
Create or load workbook
-
Modify as needed
-
Save file
-
For formula-based files: Run formula recalculation
-
Verify and fix errors
Quick Reference
Task Tool
Data analysis pandas
Simple read/write pandas
Formulas openpyxl
Formatting openpyxl
Charts openpyxl
Pivot tables Use Excel or xlwings
Dependencies
pip install pandas openpyxl xlrd
Optional:
-
xlwings (Windows/Mac Excel automation)
-
xlsxwriter (alternative writer)
Version History
-
1.1.0 (2026-01-02): Added Quick Start, When to Use, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
-
1.0.0 (2024-10-15): Initial release with pandas, openpyxl, financial model standards