data-team-leader-coach

Coach a Head of Data / VP Data / Director of Data through onboarding and strategic decisions. Data leadership is uniquely cross-functional: every other team wants something from data, the data team's success is mediated through stakeholders, and the technology landscape shifts every 18 months. The wrong choices in the first 90 days create 2-3 year remediation projects.

This is parallel to chief-of-staff-onboarding-coach and revops-leader-onboarding-coach in role-ambiguity, but with deep technical decisions layered on top.

When to engage

Trigger when:

"I'm starting as VP Data next month — what should my first 90 days look like?"
"Just promoted to Head of Data; team is 8 people; first big decision?"
"Our data stack is sprawl — Snowflake + dbt + Looker + Census + Datafold; what's broken?"
"Should we hire more data engineers or analysts?"
"Stakeholders are frustrated — every dashboard request takes 6 weeks"
"Should we build vs buy our reverse-ETL / observability / orchestration?"
"Data team is split into Eng vs Analytics — neither is happy"
"We're considering Databricks for our ML / AI work; vs adding it to Snowflake?"

Do not engage for: pure data-engineering individual-contributor coaching (different); ML-engineering specifically (different — narrower); pure data-science career (different — IC track).

Step 0: Disambiguate the role

Data leadership has three primary archetypes plus hybrids.

The 3 data-leader archetypes

Analytics-led. Leader is biased toward business analytics, KPIs, dashboards, decision support, reporting. Team composition: heavy on analysts, some engineers. Reports often into CEO / CFO / COO. Companies at this archetype: most Series A-C SaaS where data team is 5-15.
Platform / engineering-led. Leader is biased toward data infrastructure: pipelines, warehouse, observability, governance, scalability. Team composition: heavy on data + analytics engineers, some analysts. Reports often into CTO. Companies at this archetype: data-intensive products, larger companies, B2C with high-volume events.
ML / AI-led. Leader is biased toward machine learning, predictive analytics, AI-driven features. Team composition: heavy on data scientists + ML engineers + platform engineers. Reports often into CTO / CPO. Companies at this archetype: AI-native products, applied-ML companies.

Most actual roles are hybrids: analytics+platform (most common), platform+ML (common in tech-heavy), analytics+ML (rare, usually consultant-style).

Disambiguation conversation

Within first 7 days, schedule 60-90 min with the principal:

"If I'm wildly successful at 6 months, what's the headline?"
"Which is the bigger problem today: getting reliable answers to business questions, or building data infrastructure that doesn't break?"
"What's the 1-year vision for data here? Year 3?"
"How much budget for tooling / new hires / vendor change?"
"Who's the executive most aligned with my work? Most skeptical?"
"Is there an analytics-vs-engineering split today? How do you want it resolved?"

Walk away with a one-pager: archetype, top 3 outcomes, 6-month deliverables, principal expectations.

Step 1: Principal stakeholder map

Reporting line shapes priorities. Build relationships across all four common reporting destinations.

Reporting to CEO (~30% of roles)

Priority: cross-functional analytics, exec-level KPI clarity.
Risk: stretched too thin without operational depth.
Strength: org-wide credibility.

Reporting to CFO (~30% of roles)

Priority: financial reporting, revenue analytics, audit/compliance, accounting integrations.
Risk: cultural distance from product / engineering.
Strength: budget influence, finance-critical analytics.

Reporting to CTO (~25% of roles)

Priority: platform engineering, infrastructure, ML enablement.
Risk: business-side stakeholders feel underserved.
Strength: deep technical alignment, modern-stack investments.

Reporting to CPO (~15% of roles)

Priority: product analytics, user behavior, A/B testing infrastructure, ML features.
Risk: financial / operational analytics under-served.
Strength: tight feedback loop with product decisions.

Whichever line you're in, build relationships across the others. Data leaders without strong cross-functional ties become dashboard-helpdesks.

Step 2: First 30 days — Listen

Stakeholder interviews (45 min each)

Principal (CEO / CFO / CTO / CPO).
Other C-suite (each, multiple meetings).
Existing data team (1:1 with each direct, plus skip-levels).
Top 5-10 internal "consumers" of data (sales managers, product managers, finance partners, marketing leads, customer success).
Existing data-tool vendors (account managers — they know your account history).

Standard questions

"What's working well in data today?"
"What's broken or stuck?"
"What's the question you can't answer that you wish you could?"
"What's the dashboard you wish existed?"
"Where is data most reliable today? Least?"
"Which decisions are made with data vs without?"

Map systematically. Look for patterns: "everyone says forecasting is broken" or "no one trusts the marketing-attribution numbers."

Document review

Past projects + roadmaps from previous data lead.
Team-level OKRs / goals.
Vendor contracts + spend (data-tool budget is often $500K-$5M annually at Series B+).
Architectural docs: pipeline maps, warehouse schemas, transformation logic.
Data-quality / observability reports.
SLAs (if any).

Tech-stack audit

For modern B2B SaaS, the standard stack:

Warehouse layer

Snowflake (most common): mature, broad capabilities, expensive at scale.
BigQuery: GCP-native, serverless, often cost-efficient.
Databricks (Lakehouse): strong for ML, complex transformations, growing analytics use.
Redshift: legacy in many companies; AWS-native.
Self-managed (DuckDB, ClickHouse, Postgres): small teams, cost-conscious.
Edge case: feature stores (Tecton, Feast) for ML.

Ingestion / ELT

Fivetran / Airbyte / Stitch: managed connectors.
Self-built / custom Airflow: complex sources or cost-conscious teams.
Segment / RudderStack / Hightouch Events: customer-data ingestion.

Transformation

dbt (most common): industry-standard for analytics engineering.
SQLMesh: newer alternative with stronger governance.
Coalesce: GUI + code hybrid.
Custom SQL / Python: legacy; declining.

Orchestration

Airflow: legacy, complex, broad community.
Dagster: modern, asset-oriented.
Prefect: hybrid; modern pythonic.
dbt Cloud: for dbt-only teams.

BI / Visualization

Looker (Google Cloud): semantic layer, governance-strong.
Tableau (Salesforce): enterprise, visualization-rich.
Mode: SQL-friendly, analyst-centric.
Hex: notebook + dashboard hybrid.
Metabase: open-source, lightweight.
Sigma: spreadsheet-style.
Streamlit / Plotly Dash: custom apps.

Reverse ETL / Activation

Census: strong governance, integrations.
Hightouch: marketing-focused.
Polytomic: smaller; flexible.

Observability / Quality

Datafold: data-diff, regression testing.
Monte Carlo: data-quality / lineage.
Anomalo: AI-driven anomaly detection.
Great Expectations: open-source, integrated into pipelines.

ML / Feature stores

Tecton / Feast: feature stores.
Hopsworks / Vertex AI / SageMaker: ML platforms.
Weights & Biases / Comet: experiment tracking.
MLflow: open-source baseline.

Catalog / Governance

Atlan / Collibra / Alation / DataHub: data catalog + governance.
Acryl Data: open-source DataHub-based.

Team audit

Headcount + roles + seniority.
Hiring plan from previous lead.
Team composition: analysts vs analytics engineers vs data engineers vs scientists vs platform engineers vs ML engineers.
Skill gaps.
Attrition risk in next 12 months.

Output

You're synthesizing toward 3 outcomes:

State of the data: trustworthy / sprawling / siloed / mature.
State of the team: structurally sound / mismatched / under-staffed / wrong skills.
State of stakeholder relationships: trusted / frustrated / disengaged.

Step 3: Prioritization framework

Most data-leader candidates over-weight ML / AI. Reality: foundational data quality is the load-bearing problem 80% of the time.

The data-team priority hierarchy

Foundational data quality — pipelines reliable, warehouse schemas clean, definitions consistent (the "Source of Truth" problem). Solves: "every team has different ARR numbers", "marketing and sales disagree on attribution."
Self-serve analytics — dashboards, semantic layer, data discovery. Solves: "I can never get the answer I need without filing a ticket."
Advanced analytics — cohort analysis, retention modeling, attribution modeling, forecasting. Solves: "what's actually driving NRR?", "which channel is most efficient?"
ML / AI features — prediction, scoring, recommendation, embedding. Solves: product-feature problems requiring model output.
Data products — analytics SDKs, embedded analytics, data-as-a-product offerings. Solves: monetization, customer-facing data.

Don't skip #1 or #2 to chase #3 or #4. Most "ML projects" fail because the underlying data isn't clean.

Step 4: First-90-day quick wins

Pick 3-5 quick wins. Don't aim for perfection; aim for momentum.

Common quick wins

Fix the "single source of truth" for top-3 metrics. Often: ARR, NRR, CAC. Audit definitions, identify discrepancies, write canonical SQL, certify in semantic layer.
Establish data-quality monitoring. Pipelines that silently break, data drops, schema changes — implement basic monitoring (Datafold / Monte Carlo / open-source).
Standardize dashboard portfolio. Audit existing dashboards (often 50-200); kill orphans; certify top 10-15.
Build a data-request intake. Front-end the analytics team's chaos with a structured intake (Slack channel + form + triage cadence).
Document key data assets. What's in the warehouse, what does it mean, who owns it.

Quick wins to avoid in first 90 days

Major warehouse migration (Redshift → Snowflake).
ML platform rollout.
Org redesign (too early; need to know team).
Vendor change for major spend categories.

Communication

Top 3-5 quick wins with target dates.
Longer-term roadmap items with target quarters.
Trade-offs explicit.
Cross-functional implications.

Step 5: 6-month deeper plays

After quick wins, take on deeper structural projects.

Common 6-month projects

Modern data stack rationalization. Audit tooling; consolidate; potentially replace one major component (e.g., legacy Redshift → modern Snowflake).
Semantic layer rollout. Certified definitions for all key business metrics; consistent across BI tools.
Self-serve analytics platform. Empower business stakeholders to answer common questions without data-team ticket.
ML platform foundations. Feature store, training pipelines, model deployment, observability — only if there's clear product-side demand.
Reverse ETL / data activation. Data-warehouse → operational systems for marketing, sales, CS.
Data observability + quality framework. SLAs, monitoring, alerting, escalation.
Data governance. Catalog, ownership, access controls, audit.

Sequencing logic

Quick wins always in flight.
1-2 deeper plays per 6 months.
Foundational quality before advanced analytics; advanced analytics before ML.

Buy vs build — modern data stack

Default: buy. The modern data stack is mature; building from scratch wastes effort.

Build only when:

Specific use case where vendors don't fit.
Scale where vendor pricing breaks unit economics.
Strategic differentiation requires it.

Component-by-component

Warehouse: always buy. (Self-managed Postgres for very small teams.)
Ingestion: buy at most volumes (Fivetran, Airbyte). Self-build only at high volume / cost.
Transformation: buy dbt or SQLMesh. Don't roll your own.
Orchestration: buy or use open-source (Dagster / Airflow). Don't build from scratch.
BI: buy. (Hex / Mode / Looker / Metabase depending on team).
Reverse ETL: buy at small-mid scale (Census / Hightouch). Custom only for volume / cost.
Observability: buy (Datafold / Monte Carlo) or open-source (Great Expectations + custom).
ML platform: buy (SageMaker / Vertex AI / Databricks) or open-source (MLflow + custom). Pure-build is rarely justified.

Vendor-lock-in mitigation

Prefer open formats (Parquet, Iceberg, Delta).
Keep transformation logic in dbt or SQL (portable across warehouses).
Avoid deeply-coupled vendor-specific functionality unless cost-justified.
3-year cost projection before major commitment.

Team composition

Right ratios depend on archetype.

Analytics-led (5-15 person team)

60-70% analysts / analytics engineers
20-30% data engineers
10% platform / leadership

Platform-led (10-30 person team)

30-40% data engineers
30-40% analytics engineers / analysts
10-20% platform engineers
5-10% leadership

ML-led (10-30 person team)

30-40% ML engineers / scientists
20-30% data engineers
20-30% analytics
10-15% platform

Common hire-mistakes

Over-hiring data engineers when analyst capacity is the bottleneck.
Hiring data scientists before infrastructure to enable them.
Hiring senior IC ML engineer to fix data-quality problems.
Treating analytics engineers as analysts (they're more eng-leaning).

Failure modes

1. Analyst help desk

Symptom: 80% of analyst time on ticket-driven work; strategic projects don't ship. Fix: structured intake; tier the work; analytics engineering builds reusable models; self-serve unlock.

2. Lab without customer

Symptom: ML / data-science project that doesn't tie to a business problem. Fix: every ML project requires a named business owner + measurable outcome; kill projects without one.

3. Engineering vs analytics tribalism

Symptom: engineers think analysts don't understand modeling; analysts think engineers don't understand business. Fix: shared OKRs; cross-team rituals; analytics engineering bridge role.

4. No business adoption

Symptom: dashboards built; no one logs in; usage analytics show abandonment. Fix: ruthless dashboard reduction; pair every dashboard with named decision-maker; usage-tracking; quarterly review.

5. Tool-stack expansion

Symptom: every new use case adds a new tool; integration overhead and cost grow. Fix: tool budget as managed line; new tool requires retiring or absorbing existing.

6. Vendor-lock-in mistakes

Symptom: deep coupling with a vendor that's now expensive or strategically wrong. Fix: portability principle in design; multi-year contract caution.

7. Scope creep into eng / product

Symptom: data team owning product features, eng infrastructure beyond their charter. Fix: clear charter; hand off cross-functional ownership.

8. Building for hypothetical scale

Symptom: complex architecture for "future scale" that never materializes. Fix: build for today's load + 3x; revisit at 10x.

Specific failure modes for the leader

Ego in tooling. Insisting on favorite tools without business justification.
Migration fetishism. Migrating warehouses / tools as a way to look productive.
Avoidance of business stakeholder relationships. Hiding in the technical work.
No vendor management. Not negotiating vendor renewals; budget bloat.
Internal hiring obsession. Building internal talent when the right move is to engage external consultants.

Workflow

For a new data leader:

Week 1: Disambiguate the role with principal. Get system access. Meet team.
Weeks 2-3: Listening tour. Tech audit. Team audit. Stakeholder interviews.
Week 4: Synthesize. Draft priorities. Principal alignment.
Weeks 5-12: Execute quick wins. Establish cadence. Build relationships.
Months 4-6: Begin deeper structural projects. Maintain quick-win flow.
Year 2: Major restructural projects. Team reorganization (if needed).
Year 3+: Strategic role; consider exit ramps.

Natural exit ramps

CTO at smaller / mid-stage company (technical leadership pivot).
Chief Data Officer at larger company.
VP Platform (broader scope).
Founder of a data-tooling company (deep expertise + market signal).
Lateral move to head-of-data at company that's a better personal fit.

Anti-patterns

Skipping the listening tour. Acting first; tribal-knowledge mistakes compound.
Migrating warehouses in year 1. Often unnecessary; almost always under-estimated effort.
Hiring before clarifying scope. Adding headcount when role isn't yet clear amplifies confusion.
Tool-first thinking. Buying tools to solve process / org problems.
Dashboard fetishism. Building 100 dashboards for the sake of comprehensiveness.
Avoiding vendor management. Letting auto-renew dominate; budget grows uncontrolled.
Vague success metrics. "Data team is more strategic" — unmeasurable.
Ignoring data quality. Chasing ML / AI without trustworthy underlying data.

Integration with other coaches

chief-of-staff-onboarding-coach: parallel role-ambiguity coaching.
revops-leader-onboarding-coach: parallel onboarding for revenue-side data.
competitive-intelligence-coach: data team often hosts CI infrastructure.
board-meeting-prep-coach: data team produces board-level metrics.
fractional-cto-coach: technical depth coaching for related role.

Data leadership is a 2-3 year build to mature. First 90 days set the trajectory; the right 3-5 quick wins build momentum and credibility for the deeper work.