Data Review Skill

A multi-agent data platform review system that audits pipelines, warehouses, analytics infrastructure, and produces a prioritized health report with actionable recommendations.

Prerequisites

None required. Works with any data platform that provides:

Pipeline configuration files (Airflow DAGs, dbt models, etc.)
Query logs or analytics code
Infrastructure-as-code (Terraform, CloudFormation)
Data quality metrics or monitoring dashboards

Inputs

The user provides:

Platform description (required) — Overview of the data platform architecture
OR: Path to configuration directory (e.g., dbt/ , airflow/dags/ , terraform/ )
Key data flows (optional) — Critical pipelines to prioritize, e.g. "user events → warehouse → BI dashboards"
Known pain points (optional) — Current issues, slow queries, quality problems
Compliance requirements (optional) — GDPR, HIPAA, SOC2, etc.
Context file (optional) — Markdown file with platform details, constraints, team size

If the user doesn't provide optional inputs, use reasonable defaults and note assumptions.

Agent Roster

Each agent has a specialized domain. All agents read the same platform artifacts in parallel.

Agent Focus Reference

1 Data Engineer Pipeline reliability, orchestration, dependencies, error handling, monitoring agents/data-engineer.md

2 Data Scientist Analytics quality, model pipelines, feature engineering, reproducibility agents/data-scientist.md

3 Performance Analyst Query optimization, indexing, partitioning, compute costs, bottlenecks agents/performance-analyst.md

4 Security Auditor Data governance, access controls, PII handling, compliance, lineage agents/security-auditor.md

5 Synthesizer Reads all agent reports → produces prioritized action plan with trade-offs Built-in coordinator role

Workflow

Phase 1: Discovery

Confirm the platform description or config path
Identify platform type (dbt, Airflow, Databricks, Snowflake, custom, etc.)
Scan for key artifacts:
Pipeline definitions (DAGs, models, workflows)
Query files (SQL, notebooks)
Infrastructure code (Terraform, YAML configs)
Data quality tests or schema definitions
Monitoring/alerting configurations
Build a platform inventory listing:
Pipeline count and types
Data sources and destinations
Compute/storage components
Orchestration tools
Save discovery results to workspace/discovery.md

This discovery output is shared with all review agents as context.

Phase 2: Parallel Review (Sub-agents)

Spawn agents 1-4 in parallel using the Task tool. Each agent receives:

The platform description and discovery file
Their specific agent instructions (from agents/*.md )
The review checklists and scoring rubric (REFERENCE.md )
An output file path for their findings

Each agent:

Reads relevant platform artifacts (configs, code, schemas)
Applies their domain-specific audit checklist
Scores each dimension (1-5 scale)
Documents findings with file/line references
Writes prioritized recommendations to their output file

Agent output files:

workspace/agents/data-engineer.md
workspace/agents/data-scientist.md
workspace/agents/performance-analyst.md
workspace/agents/security-auditor.md

Phase 3: Architecture Debate

After all agents complete their audits, spawn a debate session:

Create a debate prompt with all agent findings
Agents discuss conflicting recommendations (e.g., performance vs. cost)
Identify trade-offs and prioritization criteria
Build consensus on critical vs. nice-to-have improvements
Save debate transcript to workspace/debate.md

Phase 4: Synthesis

The coordinator (you) acts as the Synthesizer:

Read all 4 agent reports and debate transcript
Deduplicate overlapping findings
Categorize by severity (Critical / High / Medium / Low)
Rank by impact-vs-effort for small teams
Produce the final health report

Phase 5: Output

Generate the final deliverable using the report template in REFERENCE.md .

Save to workspace/data-platform-health-report.md and present to the user.

Output Structure

workspace/ ├── discovery.md # Platform inventory from Phase 1 ├── agents/ │ ├── data-engineer.md │ ├── data-scientist.md │ ├── performance-analyst.md │ └── security-auditor.md ├── debate.md # Architecture trade-offs discussion └── data-platform-health-report.md # Final synthesized report

Coordinator Responsibilities

Run the discovery phase to build platform inventory
Spawn review agents in parallel with Task tool
Ensure each agent has access to the discovery file and reference materials
Collect all agent reports
Facilitate the architecture debate (spawn debate agents or synthesize manually)
Produce the final health report with scoring and prioritized recommendations
Present the report to the user with executive summary

Customization

The user can customize the review by:

Skipping agents: "Skip data science review, focus on infrastructure and performance"
Focusing on specific pipelines: "Only review the user_events ETL and downstream models"
Prioritizing dimensions: "I care most about compliance, less about performance"
Adding comparisons: "Compare our approach to industry best practices for event streaming"
Specifying constraints: "We're a 2-person team, recommend low-maintenance solutions only"
Setting compliance scope: "Audit for GDPR compliance specifically"

Adapt the agent roster and instructions accordingly.

Scoring System

Each agent rates their domain on a 1-5 scale:

5 (Excellent): Industry best practices, fully automated, no issues
4 (Good): Minor improvements possible, well-maintained
3 (Adequate): Functional but needs attention, some technical debt
2 (Poor): Significant issues, requires immediate action
1 (Critical): Broken or severely compromised, blocking business value

The final report includes:

Overall platform health score (average across domains)
Per-domain scores with justification
Critical findings (score ≤ 2)
Quick wins (high impact, low effort)
Strategic improvements (high impact, high effort)

Common Review Scenarios

Scenario 1: New Team Inheriting a Data Platform

/data-review "Inherited a Snowflake + dbt + Airflow stack. Need to understand health and risks."

Scenario 2: Pre-Migration Assessment

/data-review "Planning to migrate from on-prem Postgres to BigQuery. Audit current state."

Scenario 3: Performance Investigation

/data-review "Dashboard queries taking 2+ minutes. Focus on query optimization and indexing."

Scenario 4: Compliance Audit

/data-review "Need HIPAA compliance audit of our analytics platform. Check PII handling and access controls."

Scenario 5: Cost Optimization

/data-review "Warehouse costs doubled this quarter. Identify waste and optimization opportunities."

Agent Invocation Pattern

Example internal workflow

use Task tool to spawn:

data-engineer with discovery.md + REFERENCE.md → workspace/agents/data-engineer.md
data-scientist with discovery.md + REFERENCE.md → workspace/agents/data-scientist.md
performance-analyst with discovery.md + REFERENCE.md → workspace/agents/performance-analyst.md
security-auditor with discovery.md + REFERENCE.md → workspace/agents/security-auditor.md

Wait for all agents to complete

Spawn debate session (optional)

use Task tool to spawn debate with all agent findings

Synthesize final report

Read all outputs + debate transcript Apply report template from REFERENCE.md Generate data-platform-health-report.md

References

REFERENCE.md — Audit checklists, scoring rubric, common issues and fixes
agents/data-engineer.md — Pipeline reliability and orchestration focus
agents/data-scientist.md — Analytics quality and reproducibility focus
agents/performance-analyst.md — Query optimization and cost efficiency focus
agents/security-auditor.md — Governance, compliance, and access control focus

data-review

Safety Notice

Copy this and send it to your AI assistant to learn

Example internal workflow

Wait for all agents to complete

Spawn debate session (optional)

Synthesize final report

Source Transparency

Related Skills

competitor-analysis

ux-review

business-review

zellij-guide