Great Expectations
Audience: Data engineers building validated data pipelines.
Goal: Provide GX patterns for expectation-based validation and monitoring.
Scripts
Execute GX functions from scripts/expectations.py :
from scripts.expectations import ( get_pandas_context, add_dataframe_asset, create_basic_suite, run_validation )
Usage Examples
Quick Setup
from scripts.expectations import get_pandas_context, add_dataframe_asset
context, datasource = get_pandas_context("my_datasource") batch_request = add_dataframe_asset(datasource, "users", df)
Create Expectation Suite
from scripts.expectations import create_basic_suite
columns_config = { 'user_id': {'not_null': True, 'unique': True, 'type': 'int'}, 'age': {'min': 0, 'max': 150}, 'status': {'values': ['active', 'inactive', 'pending']}, 'email': {'regex': r'^[\w.-]+@[\w.-]+.\w+$'} }
suite = create_basic_suite(context, "user_suite", columns_config)
Run Validation
from scripts.expectations import run_validation
results = run_validation( context, checkpoint_name="user_checkpoint", batch_request=batch_request, suite_name="user_suite" )
if results['success']: print("All expectations passed!") else: for failure in results['failures']: print(f"Failed: {failure['expectation']} on {failure['column']}")
Common Expectations Reference
Category Expectation Description
Table ExpectTableRowCountToBeBetween
Row count range
Existence ExpectColumnToExist
Column must exist
Nulls ExpectColumnValuesToNotBeNull
No null values
Range ExpectColumnValuesToBeBetween
Value bounds
Set ExpectColumnValuesToBeInSet
Allowed values
Pattern ExpectColumnValuesToMatchRegex
Regex match
Unique ExpectColumnValuesToBeUnique
No duplicates
Data Docs
Build and open HTML reports
context.build_data_docs() context.open_data_docs()
Directory Structure
great_expectations/ ├── great_expectations.yml # Config ├── expectations/ # Expectation suites (JSON) ├── checkpoints/ # Checkpoint definitions ├── plugins/ # Custom expectations └── uncommitted/ ├── data_docs/ # Generated HTML docs └── validations/ # Validation results
When to Use Great Expectations
Use Case GX Alternative
Pipeline monitoring ✓
Data warehouse validation ✓
Automated data docs ✓
Simple DataFrame checks
Pandera
Record-level API validation
Pydantic
Dependencies
great_expectations>=0.18 pandas