data-quality-checker

Implement comprehensive data quality checks and validation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-quality-checker" with this command: npx skills add armanzeroeight/fastagent-plugins/armanzeroeight-fastagent-plugins-data-quality-checker

Data Quality Checker

Implement comprehensive data quality checks and validation.

Quick Start

Use Great Expectations for validation, implement schema checks, monitor data quality metrics, set up alerts.

Instructions

Great Expectations Setup

import great_expectations as gx

context = gx.get_context()

Create expectation suite

suite = context.add_expectation_suite("data_quality_suite")

Add expectations

validator = context.get_validator( batch_request=batch_request, expectation_suite_name="data_quality_suite" )

Schema validation

validator.expect_table_columns_to_match_ordered_list( column_list=["id", "name", "email", "created_at"] )

Null checks

validator.expect_column_values_to_not_be_null("email")

Value ranges

validator.expect_column_values_to_be_between("age", min_value=0, max_value=120)

Uniqueness

validator.expect_column_values_to_be_unique("email")

Run validation

results = validator.validate()

Custom Validation Rules

def validate_data_quality(df): issues = []

# Check for nulls
null_counts = df.isnull().sum()
if null_counts.any():
    issues.append(f"Null values found: {null_counts[null_counts > 0]}")

# Check for duplicates
duplicates = df.duplicated().sum()
if duplicates > 0:
    issues.append(f"Found {duplicates} duplicate rows")

# Check data freshness
max_date = df['created_at'].max()
if (datetime.now() - max_date).days > 1:
    issues.append("Data is stale")

return issues

Data Quality Metrics

def calculate_quality_metrics(df): return { 'completeness': 1 - (df.isnull().sum().sum() / df.size), 'uniqueness': df.drop_duplicates().shape[0] / df.shape[0], 'validity': (df['email'].str.contains('@').sum() / len(df)), 'timeliness': (datetime.now() - df['created_at'].max()).days }

Best Practices

  • Validate at ingestion

  • Monitor quality metrics

  • Set up alerts for failures

  • Document quality rules

  • Regular quality audits

  • Track quality trends

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

gcp-cost-optimizer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

schema-designer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

api-documentation-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

aws-cost-optimizer

No summary provided by upstream source.

Repository SourceNeeds Review