lakehouse-pipeline-design

Lakehouse pipeline design (Databricks)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "lakehouse-pipeline-design" with this command: npx skills add hubert-dudek/medium/hubert-dudek-medium-lakehouse-pipeline-design

Lakehouse pipeline design (Databricks)

Use this skill when someone asks for a pipeline design, DLT design, ETL plan, CDC ingestion, or a review of an existing pipeline.

Deliverables

When activated, produce at least:

  • A filled design doc based on assets/pipeline-design-doc.md

  • A short, actionable implementation checklist (you can reuse references/pipeline-checklist.md )

Optionally (only if asked): a code skeleton (PySpark / SQL / DLT) that matches the design.

Minimal inputs (ask only what’s missing)

Ask up to 3 questions total. Prefer defaults.

  • Source type: files / DB / API / Kafka / etc.

  • Mode: batch / streaming / CDC

  • Target: tables (catalog.schema.*) and consumers (dashboards, ML, downstream jobs)

  • Volume + SLA: rows/day, latency/freshness SLO, cost constraints

  • Governance: PII? UC catalogs/schemas, access groups

Design guidance (what to include)

  • Architecture: bronze → silver → gold; DLT vs Jobs; where to enforce quality

  • Incremental strategy: watermarking, MERGE for CDC, idempotency

  • Delta table design: partitioning, ZORDER, OPTIMIZE/VACUUM policy

  • Quality checks: schema validation, null/unique, freshness, anomaly checks

  • Observability: metrics, logs, expectations failures, alerts, runbooks

  • Backfills: replay strategy, how to reprocess safely, versioning

  • Security: UC permissions, row/column filtering if needed, secrets management

  • Operational: retries, SLAs, escalation, deployment strategy

Output rules

  • Put concrete decisions in a “Decisions” section and unknowns in “Open questions”.

  • If details are missing, keep placeholders like {{...}} and add an “Info needed” section.

  • Keep the doc concise; link to references/pipeline-checklist.md when you need long checklists.

Examples

User: “Design a DLT pipeline that ingests Salesforce accounts daily and publishes a gold table for dashboards.”

Output: Design doc + checklist + optional DLT skeleton.

User: “Review our existing silver-to-gold job for performance and reliability.”

Output: Review-style design doc: risks, improvements, and prioritized actions.

Edge cases

  • Streaming sources: include checkpointing, schema evolution handling, and late data policy.

  • Regulated data: include classification, retention, and UC policy controls.

  • Multi-tenant tables: call out tenant key, partitioning, and access controls.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

sql-performance-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

team-templates

No summary provided by upstream source.

Repository SourceNeeds Review
General

Glovo

Navigate Glovo in a live browser session to compare stores, manage carts, and reach checkout safely before ordering.

Registry SourceRecently Updated
General

Relevance Ai

Relevance AI integration. Manage Organizations, Users. Use when the user wants to interact with Relevance AI data.

Registry SourceRecently Updated