Pandera Validation
Audience: Data engineers validating pandas DataFrames.
Goal: Provide pandera patterns for schema validation and type checking.
Scripts
Execute schema functions from scripts/schemas.py :
from scripts.schemas import ( create_user_schema, create_nullable_schema, create_date_range_schema, UserSchema, validate_with_errors, infer_and_export_schema )
Usage Examples
Basic Schema Validation
from scripts.schemas import create_user_schema
schema = create_user_schema() validated_df = schema.validate(df)
Collect All Errors
from scripts.schemas import create_user_schema, validate_with_errors
schema = create_user_schema() validated_df, errors = validate_with_errors(df, schema)
if errors: for err in errors: print(f"{err['column']}: {err['check']} - {err['failure_case']}")
Class-Based Schema
from scripts.schemas import UserSchema
Validate with type hints
UserSchema.validate(df)
Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame: return df.query("status == 'active'")
Infer Schema from DataFrame
from scripts.schemas import infer_and_export_schema
schema_export = infer_and_export_schema(df) print(schema_export['python_code']) # Python schema definition print(schema_export['yaml']) # YAML schema
Built-in Checks Reference
Check Type Example Description
Numeric Check.gt(0) , Check.in_range(0, 100)
Comparisons
String Check.str_matches(r'pattern')
Regex match
Set membership Check.isin(['A', 'B'])
Allowed values
Uniqueness unique=True on Column No duplicates
Nullable nullable=True on Column Allow nulls
Decorator-Based Validation
import pandera as pa
@pa.check_output(schema) def load_data(path: str) -> pd.DataFrame: return pd.read_csv(path)
@pa.check_input(schema, "df") def process_data(df: pd.DataFrame) -> pd.DataFrame: return df.assign(processed=True)
@pa.check_io(df=input_schema, out=output_schema) def transform_data(df: pd.DataFrame) -> pd.DataFrame: return df.transform(...)
When to Use Pandera
Use Case Pandera Alternative
DataFrame validation ✓
Type hints for DataFrames ✓
ETL pipeline checks ✓ Great Expectations
Record-level validation
Pydantic
Dependencies
pandera>=0.18 pandas