Python Panel Data
Purpose
This skill helps economists run panel data models in Python using pandas , statsmodels , and linearmodels , with correct fixed effects, clustering, and diagnostics.
When to Use
-
Estimating fixed effects or random effects models
-
Running difference-in-differences on panel data
-
Creating regression tables and plots in Python
Instructions
Follow these steps to complete the task:
Step 1: Understand the Context
Before generating any code, ask the user:
-
What is the unit of observation and panel identifiers?
-
Which outcomes and regressors are required?
-
What fixed effects or time effects are needed?
-
How should standard errors be clustered?
Step 2: Generate the Output
Based on the context, generate Python code that:
-
Loads and cleans the data with pandas
-
Sets a MultiIndex for panel structure
-
Fits the model using linearmodels.PanelOLS or RandomEffects
-
Outputs results in a readable table and optional LaTeX
Step 3: Verify and Explain
After generating output:
-
Interpret key coefficients
-
Note assumptions (strict exogeneity, parallel trends, etc.)
-
Suggest robustness checks (alternative clustering, placebo tests)
Example Prompts
-
"Run a two-way fixed effects model with firm and year effects"
-
"Estimate a DiD using state and year fixed effects"
-
"Export panel regression results to LaTeX"
Example Output
============================================
Panel Data Analysis in Python
============================================
import pandas as pd from linearmodels.panel import PanelOLS
Load data
df = pd.read_csv("panel_data.csv")
Set panel index
df = df.set_index(["firm_id", "year"])
Create treatment indicator
df["treat_post"] = df["treated"] * df["post"]
Two-way fixed effects model
model = PanelOLS.from_formula( "outcome ~ 1 + treat_post + EntityEffects + TimeEffects", data=df ) results = model.fit(cov_type="clustered", cluster_entity=True)
print(results.summary)
Requirements
Software
- Python 3.10+
Packages
-
pandas
-
linearmodels
-
statsmodels
Install with:
pip install pandas linearmodels statsmodels
Best Practices
-
Always verify panel identifiers and balanced vs unbalanced panels
-
Cluster standard errors at the appropriate level
-
Check for missing data before estimation
Common Pitfalls
-
Failing to set a proper panel index
-
Using pooled OLS when fixed effects are required
-
Misinterpreting coefficients without accounting for fixed effects
References
-
linearmodels documentation
-
statsmodels documentation
-
Wooldridge (2010) Econometric Analysis of Cross Section and Panel Data
Changelog
v1.0.0
- Initial release