databricks-local-dev-loop

Databricks Local Dev Loop

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-local-dev-loop" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-databricks-local-dev-loop

Databricks Local Dev Loop

Overview

Set up a fast, reproducible local development workflow for Databricks.

Prerequisites

  • Completed databricks-install-auth setup

  • Python 3.8+ with pip

  • VS Code or PyCharm IDE

  • Access to a running cluster

Instructions

Step 1: Project Structure

my-databricks-project/ ├── src/ │ ├── init.py │ ├── pipelines/ │ │ ├── init.py │ │ ├── bronze.py # Raw data ingestion │ │ ├── silver.py # Data cleansing │ │ └── gold.py # Business aggregations │ └── utils/ │ ├── init.py │ └── helpers.py ├── tests/ │ ├── init.py │ ├── unit/ │ │ └── test_helpers.py │ └── integration/ │ └── test_pipelines.py ├── notebooks/ # Databricks notebooks │ └── exploration.py ├── resources/ # Asset Bundle configs │ └── jobs.yml ├── databricks.yml # Asset Bundle project config ├── .env.local # Local secrets (git-ignored) ├── .env.example # Template for team ├── pyproject.toml └── requirements.txt

Step 2: Install Development Tools

set -euo pipefail

Install Databricks SDK and CLI

pip install databricks-sdk databricks-cli

Install dbx for deployment

pip install dbx

Install Databricks Connect v2 (for local Spark)

pip install databricks-connect==14.3.*

Install testing tools

pip install pytest pytest-cov

Step 3: Configure Databricks Connect

Configure Databricks Connect for local development

databricks-connect configure

Or set environment variables

export DATABRICKS_HOST="https://adb-1234567890.1.azuredatabricks.net" export DATABRICKS_TOKEN="dapi..." export DATABRICKS_CLUSTER_ID="1234-567890-abcde123" # 567890: port 1234 - example/test

Step 4: Create databricks.yml (Asset Bundle)

databricks.yml

bundle: name: my-databricks-project

workspace: host: ${DATABRICKS_HOST}

variables: catalog: description: Unity Catalog name default: main schema: description: Schema name default: default

targets: dev: default: true mode: development workspace: root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev

staging: mode: development workspace: root_path: /Shared/.bundle/${bundle.name}/staging

prod: mode: production workspace: root_path: /Shared/.bundle/${bundle.name}/prod

Step 5: Local Testing Setup

tests/conftest.py

import pytest from pyspark.sql import SparkSession

@pytest.fixture(scope="session") def spark(): """Create local SparkSession for unit tests.""" return SparkSession.builder
.master("local[*]")
.appName("unit-tests")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.getOrCreate()

@pytest.fixture(scope="session") def dbx_spark(): """Connect to Databricks cluster for integration tests.""" from databricks.connect import DatabricksSession return DatabricksSession.builder.getOrCreate()

Step 6: VS Code Configuration

// .vscode/settings.json { "python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python", "python.testing.pytestEnabled": true, "python.testing.pytestArgs": ["tests"], "python.linting.enabled": true, "python.linting.pylintEnabled": true, "editor.formatOnSave": true, "[python]": { "editor.defaultFormatter": "ms-python.black-formatter" }, "databricks.python.envFile": "${workspaceFolder}/.env.local" }

// .vscode/launch.json { "version": "0.2.0", "configurations": [ { "name": "Python: Current File (Databricks Connect)", "type": "python", "request": "launch", "program": "${file}", "console": "integratedTerminal", "env": { "DATABRICKS_HOST": "${env:DATABRICKS_HOST}", "DATABRICKS_TOKEN": "${env:DATABRICKS_TOKEN}", "DATABRICKS_CLUSTER_ID": "${env:DATABRICKS_CLUSTER_ID}" } } ] }

Output

  • Working local development environment

  • Databricks Connect configured for remote execution

  • Unit and integration test setup

  • VS Code/PyCharm integration ready

Error Handling

Error Cause Solution

Cluster not running

Auto-terminated Start cluster first

Version mismatch

DBR vs Connect version Match databricks-connect version to DBR

Module not found

Missing local install Run pip install -e .

Connection timeout

Network/firewall Check VPN and firewall rules

SparkSession already exists

Multiple sessions Use getOrCreate() pattern

Examples

Run Tests Locally

Unit tests (local Spark)

pytest tests/unit/ -v

Integration tests (Databricks Connect)

pytest tests/integration/ -v --tb=short

With coverage

pytest tests/ --cov=src --cov-report=html

Deploy with Asset Bundles

Validate bundle

databricks bundle validate

Deploy to dev

databricks bundle deploy -t dev

Run job

databricks bundle run -t dev my-job

Interactive Development

src/pipelines/bronze.py

from pyspark.sql import SparkSession, DataFrame

def ingest_raw_data(spark: SparkSession, source_path: str) -> DataFrame: """Ingest raw data from source.""" return spark.read.format("json").load(source_path)

if name == "main": # Works locally with Databricks Connect from databricks.connect import DatabricksSession spark = DatabricksSession.builder.getOrCreate()

df = ingest_raw_data(spark, "/mnt/raw/events")
df.show()

Hot Reload with dbx

Watch for changes and sync

dbx sync --watch

Or use Asset Bundles

databricks bundle sync -t dev --watch

Resources

  • Databricks Connect

  • Asset Bundles

  • VS Code Extension

  • Testing Notebooks

Next Steps

See databricks-sdk-patterns for production-ready code patterns.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review