data-validation

Audience: Data engineers building validation pipelines.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-validation" with this command: npx skills add majesticlabs-dev/majestic-marketplace/majesticlabs-dev-majestic-marketplace-data-validation

Data Validation

Audience: Data engineers building validation pipelines.

Goal: Provide validation patterns for custom business rules.

Framework-specific skills:

  • pydantic-validation

  • Record-level validation with Pydantic

  • pandera-validation

  • DataFrame schema validation

  • great-expectations

  • Pipeline expectations and monitoring

Scripts

Execute validation functions from scripts/validators.py :

from scripts.validators import ( ValidationResult, DataValidator, validate_no_duplicates, validate_referential_integrity, validate_date_range, validate_value_in_set, run_validation_pipeline, validate_with_schema_version, assert_schema_match, assert_no_nulls, assert_unique, assert_values_in_set )

Framework Selection

Use Case Framework

API request/response Pydantic

Record-by-record ETL Pydantic

DataFrame validation Pandera

Type hints for DataFrames Pandera

Pipeline monitoring Great Expectations

Data warehouse checks Great Expectations

Custom business rules Custom functions (this skill)

Usage Examples

Basic Validation

from scripts.validators import validate_no_duplicates, validate_referential_integrity

Check duplicates

result = validate_no_duplicates(df, cols=['id']) if not result.passed: print(f"Error: {result.message}") print(result.failed_rows)

Check referential integrity

result = validate_referential_integrity(df, 'user_id', users_df, 'id')

Validation Pipeline

from scripts.validators import DataValidator, validate_no_duplicates, validate_date_range

validator = DataValidator() validator.add_check(lambda df: validate_no_duplicates(df, ['id'])) validator.add_check(lambda df: validate_date_range(df, 'created_at', '2020-01-01', '2025-12-31'))

results = validator.validate(df) if not results['passed']: for check in results['checks']: if not check['passed']: print(f"Failed: {check['message']}")

Config-Driven Pipeline

from scripts.validators import run_validation_pipeline

config = { 'unique_columns': ['id'], 'date_ranges': { 'created_at': ('2020-01-01', '2025-12-31'), 'updated_at': ('2020-01-01', '2025-12-31') } }

clean_df, results = run_validation_pipeline(df, config)

Test Assertions

from scripts.validators import assert_schema_match, assert_no_nulls, assert_unique

In pytest

def test_data_quality(): assert_schema_match(df, {'id': 'int64', 'email': 'object'}) assert_no_nulls(df, ['id', 'email']) assert_unique(df, ['id'])

Dependencies

pandas

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

google-ads-strategy

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

viral-content

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

market-research

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

free-tool-arsenal

No summary provided by upstream source.

Repository SourceNeeds Review