databricks-python-sdk

Databricks Development Guide

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-python-sdk" with this command: npx skills add databricks-solutions/ai-dev-kit/databricks-solutions-ai-dev-kit-databricks-python-sdk

Databricks Development Guide

This skill provides guidance for Databricks SDK, Databricks Connect, CLI, and REST API.

SDK Documentation: https://databricks-sdk-py.readthedocs.io/en/latest/ GitHub Repository: https://github.com/databricks/databricks-sdk-py

Environment Setup

  • Use existing virtual environment at .venv or use uv to create one

  • For Spark operations: uv pip install databricks-connect

  • For SDK operations: uv pip install databricks-sdk

  • Databricks CLI version should be 0.278.0 or higher

Configuration

  • Default profile name: DEFAULT

  • Config file: ~/.databrickscfg

  • Environment variables: DATABRICKS_HOST , DATABRICKS_TOKEN

Databricks Connect (Spark Operations)

Use databricks-connect for running Spark code locally against a Databricks cluster.

from databricks.connect import DatabricksSession

Auto-detects 'DEFAULT' profile from ~/.databrickscfg

spark = DatabricksSession.builder.getOrCreate()

With explicit profile

spark = DatabricksSession.builder.profile("MY_PROFILE").getOrCreate()

Use spark as normal

df = spark.sql("SELECT * FROM catalog.schema.table") df.show()

IMPORTANT: Do NOT set .master("local[*]")

  • this will cause issues with Databricks Connect.

Direct REST API Access

For operations not yet in SDK or overly complex via SDK, use direct REST API:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

Direct API call using authenticated client

response = w.api_client.do( method="GET", path="/api/2.0/clusters/list" )

POST with body

response = w.api_client.do( method="POST", path="/api/2.0/jobs/run-now", body={"job_id": 123} )

When to use: Prefer SDK methods when available. Use api_client.do for:

  • New API endpoints not yet in SDK

  • Complex operations where SDK abstraction is problematic

  • Debugging/testing raw API responses

Databricks CLI

Check version (should be >= 0.278.0)

databricks --version

Use specific profile

databricks --profile MY_PROFILE clusters list

Common commands

databricks clusters list databricks jobs list databricks workspace ls /Users/me

SDK Documentation Architecture

The SDK documentation follows a predictable URL pattern:

Base: https://databricks-sdk-py.readthedocs.io/en/latest/

Workspace APIs: /workspace/{category}/{service}.html Account APIs: /account/{category}/{service}.html Authentication: /authentication.html DBUtils: /dbutils.html

Workspace API Categories

Category Services

compute

clusters, cluster_policies, command_execution, instance_pools, libraries

catalog

catalogs, schemas, tables, volumes, functions, storage_credentials, external_locations

jobs

jobs

sql

warehouses, statement_execution, queries, alerts, dashboards

serving

serving_endpoints

vectorsearch

vector_search_indexes, vector_search_endpoints

pipelines

pipelines

workspace

repos, secrets, workspace, git_credentials

files

files, dbfs

ml

experiments, model_registry

Authentication

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html

Environment Variables

DATABRICKS_HOST=https://your-workspace.cloud.databricks.com DATABRICKS_TOKEN=dapi... # Personal Access Token

Code Patterns

Auto-detect credentials from environment

from databricks.sdk import WorkspaceClient w = WorkspaceClient()

Explicit token auth

w = WorkspaceClient( host="https://your-workspace.cloud.databricks.com", token="dapi..." )

Azure Service Principal

w = WorkspaceClient( host="https://adb-xxx.azuredatabricks.net", azure_workspace_resource_id="/subscriptions/.../resourceGroups/.../providers/Microsoft.Databricks/workspaces/...", azure_tenant_id="tenant-id", azure_client_id="client-id", azure_client_secret="secret" )

Use a named profile from ~/.databrickscfg

w = WorkspaceClient(profile="MY_PROFILE")

Core API Reference

Clusters API

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/compute/clusters.html

List all clusters

for cluster in w.clusters.list(): print(f"{cluster.cluster_name}: {cluster.state}")

Get cluster details

cluster = w.clusters.get(cluster_id="0123-456789-abcdef")

Create a cluster (returns Wait object)

wait = w.clusters.create( cluster_name="my-cluster", spark_version=w.clusters.select_spark_version(latest=True), node_type_id=w.clusters.select_node_type(local_disk=True), num_workers=2 ) cluster = wait.result() # Wait for cluster to be running

Or use create_and_wait for blocking call

cluster = w.clusters.create_and_wait( cluster_name="my-cluster", spark_version="14.3.x-scala2.12", node_type_id="i3.xlarge", num_workers=2, timeout=timedelta(minutes=30) )

Start/stop/delete

w.clusters.start(cluster_id="...").result() w.clusters.stop(cluster_id="...") w.clusters.delete(cluster_id="...")

Jobs API

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html

from databricks.sdk.service.jobs import Task, NotebookTask

List jobs

for job in w.jobs.list(): print(f"{job.job_id}: {job.settings.name}")

Create a job

created = w.jobs.create( name="my-job", tasks=[ Task( task_key="main", notebook_task=NotebookTask(notebook_path="/Users/me/notebook"), existing_cluster_id="0123-456789-abcdef" ) ] )

Run a job now

run = w.jobs.run_now_and_wait(job_id=created.job_id) print(f"Run completed: {run.state.result_state}")

Get run output

output = w.jobs.get_run_output(run_id=run.run_id)

SQL Statement Execution

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/sql/statement_execution.html

Execute SQL query

response = w.statement_execution.execute_statement( warehouse_id="abc123", statement="SELECT * FROM catalog.schema.table LIMIT 10", wait_timeout="30s" )

Check status and get results

if response.status.state == StatementState.SUCCEEDED: for row in response.result.data_array: print(row)

For large results, fetch chunks

chunk = w.statement_execution.get_statement_result_chunk_n( statement_id=response.statement_id, chunk_index=0 )

SQL Warehouses

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/sql/warehouses.html

List warehouses

for wh in w.warehouses.list(): print(f"{wh.name}: {wh.state}")

Get warehouse

warehouse = w.warehouses.get(id="abc123")

Create warehouse

created = w.warehouses.create_and_wait( name="my-warehouse", cluster_size="Small", max_num_clusters=1, auto_stop_mins=15 )

Start/stop

w.warehouses.start(id="abc123").result() w.warehouses.stop(id="abc123").result()

Unity Catalog - Tables

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/tables.html

List tables in a schema

for table in w.tables.list(catalog_name="main", schema_name="default"): print(f"{table.full_name}: {table.table_type}")

Get table info

table = w.tables.get(full_name="main.default.my_table") print(f"Columns: {[c.name for c in table.columns]}")

Check if table exists

exists = w.tables.exists(full_name="main.default.my_table")

Unity Catalog - Catalogs & Schemas

Doc (Catalogs): https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/catalogs.html Doc (Schemas): https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/schemas.html

List catalogs

for catalog in w.catalogs.list(): print(catalog.name)

Create catalog

w.catalogs.create(name="my_catalog", comment="Description")

List schemas

for schema in w.schemas.list(catalog_name="main"): print(schema.name)

Create schema

w.schemas.create(name="my_schema", catalog_name="main")

Volumes

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/volumes.html

from databricks.sdk.service.catalog import VolumeType

List volumes

for vol in w.volumes.list(catalog_name="main", schema_name="default"): print(f"{vol.full_name}: {vol.volume_type}")

Create managed volume

w.volumes.create( catalog_name="main", schema_name="default", name="my_volume", volume_type=VolumeType.MANAGED )

Read volume info

vol = w.volumes.read(name="main.default.my_volume")

Files API

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/files/files.html

Upload file to volume

w.files.upload( file_path="/Volumes/main/default/my_volume/data.csv", contents=open("local_file.csv", "rb") )

Download file

with w.files.download(file_path="/Volumes/main/default/my_volume/data.csv") as f: content = f.read()

List directory contents

for entry in w.files.list_directory_contents("/Volumes/main/default/my_volume/"): print(f"{entry.name}: {entry.is_directory}")

Upload/download with progress (parallel)

w.files.upload_from( file_path="/Volumes/main/default/my_volume/large.parquet", source_path="/local/path/large.parquet", use_parallel=True )

w.files.download_to( file_path="/Volumes/main/default/my_volume/large.parquet", destination="/local/output/", use_parallel=True )

Serving Endpoints (Model Serving)

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving/serving_endpoints.html

List endpoints

for ep in w.serving_endpoints.list(): print(f"{ep.name}: {ep.state}")

Get endpoint

endpoint = w.serving_endpoints.get(name="my-endpoint")

Query endpoint

response = w.serving_endpoints.query( name="my-endpoint", inputs={"prompt": "Hello, world!"} )

For chat/completions endpoints

response = w.serving_endpoints.query( name="my-chat-endpoint", messages=[{"role": "user", "content": "Hello!"}] )

Get OpenAI-compatible client

openai_client = w.serving_endpoints.get_open_ai_client()

Vector Search

Doc (Indexes): https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_indexes.html Doc (Endpoints): https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_endpoints.html

List vector search indexes

for idx in w.vector_search_indexes.list_indexes(endpoint_name="my-vs-endpoint"): print(idx.name)

Query index

results = w.vector_search_indexes.query_index( index_name="main.default.my_index", columns=["id", "text", "embedding"], query_text="search query", num_results=10 ) for doc in results.result.data_array: print(doc)

Pipelines (Delta Live Tables)

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html

List pipelines

for pipeline in w.pipelines.list_pipelines(): print(f"{pipeline.name}: {pipeline.state}")

Get pipeline

pipeline = w.pipelines.get(pipeline_id="abc123")

Start pipeline update

w.pipelines.start_update(pipeline_id="abc123")

Stop pipeline

w.pipelines.stop_and_wait(pipeline_id="abc123")

Secrets

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/secrets.html

List secret scopes

for scope in w.secrets.list_scopes(): print(scope.name)

Create scope

w.secrets.create_scope(scope="my-scope")

Put secret

w.secrets.put_secret(scope="my-scope", key="api-key", string_value="secret123")

Get secret (returns GetSecretResponse with value)

secret = w.secrets.get_secret(scope="my-scope", key="api-key")

List secrets in scope (metadata only, not values)

for s in w.secrets.list_secrets(scope="my-scope"): print(s.key)

DBUtils

Doc: https://databricks-sdk-py.readthedocs.io/en/latest/dbutils.html

Access dbutils through WorkspaceClient

dbutils = w.dbutils

File system operations

files = dbutils.fs.ls("/") dbutils.fs.cp("dbfs:/source", "dbfs:/dest") dbutils.fs.rm("dbfs:/path", recurse=True)

Secrets (same as w.secrets but dbutils interface)

value = dbutils.secrets.get(scope="my-scope", key="my-key")

Common Patterns

CRITICAL: Async Applications (FastAPI, etc.)

The Databricks SDK is fully synchronous. All calls block the thread. In async applications (FastAPI, asyncio), you MUST wrap SDK calls with asyncio.to_thread() to avoid blocking the event loop.

import asyncio from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

WRONG - blocks the event loop

async def get_clusters_bad(): return list(w.clusters.list()) # BLOCKS!

CORRECT - runs in thread pool

async def get_clusters_good(): return await asyncio.to_thread(lambda: list(w.clusters.list()))

CORRECT - for simple calls

async def get_cluster(cluster_id: str): return await asyncio.to_thread(w.clusters.get, cluster_id)

CORRECT - FastAPI endpoint

from fastapi import FastAPI app = FastAPI()

@app.get("/clusters") async def list_clusters(): clusters = await asyncio.to_thread(lambda: list(w.clusters.list())) return [{"id": c.cluster_id, "name": c.cluster_name} for c in clusters]

@app.post("/query") async def run_query(sql: str, warehouse_id: str): # Wrap the blocking SDK call response = await asyncio.to_thread( w.statement_execution.execute_statement, statement=sql, warehouse_id=warehouse_id, wait_timeout="30s" ) return response.result.data_array

Note: WorkspaceClient().config.host is NOT a network call - it just reads config. No need to wrap property access.

Wait for Long-Running Operations

from datetime import timedelta

Pattern 1: Use *_and_wait methods

cluster = w.clusters.create_and_wait( cluster_name="test", spark_version="14.3.x-scala2.12", node_type_id="i3.xlarge", num_workers=2, timeout=timedelta(minutes=30) )

Pattern 2: Use Wait object

wait = w.clusters.create(...) cluster = wait.result() # Blocks until ready

Pattern 3: Manual polling with callback

def progress(cluster): print(f"State: {cluster.state}")

cluster = w.clusters.wait_get_cluster_running( cluster_id="...", timeout=timedelta(minutes=30), callback=progress )

Pagination

All list methods return iterators that handle pagination automatically

for job in w.jobs.list(): # Fetches all pages print(job.settings.name)

For manual control

from databricks.sdk.service.jobs import ListJobsRequest response = w.jobs.list(limit=10) for job in response: print(job)

Error Handling

from databricks.sdk.errors import NotFound, PermissionDenied, ResourceAlreadyExists

try: cluster = w.clusters.get(cluster_id="invalid-id") except NotFound: print("Cluster not found") except PermissionDenied: print("Access denied")

When Uncertain

If I'm unsure about a method, I should:

Check the documentation URL pattern:

Common categories:

  • Clusters: /workspace/compute/clusters.html

  • Jobs: /workspace/jobs/jobs.html

  • Tables: /workspace/catalog/tables.html

  • Warehouses: /workspace/sql/warehouses.html

  • Serving: /workspace/serving/serving_endpoints.html

Fetch and verify before providing guidance on parameters or return types.

Quick Reference Links

API Documentation URL

Authentication https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html

Clusters https://databricks-sdk-py.readthedocs.io/en/latest/workspace/compute/clusters.html

Jobs https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html

SQL Warehouses https://databricks-sdk-py.readthedocs.io/en/latest/workspace/sql/warehouses.html

Statement Execution https://databricks-sdk-py.readthedocs.io/en/latest/workspace/sql/statement_execution.html

Tables https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/tables.html

Catalogs https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/catalogs.html

Schemas https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/schemas.html

Volumes https://databricks-sdk-py.readthedocs.io/en/latest/workspace/catalog/volumes.html

Files https://databricks-sdk-py.readthedocs.io/en/latest/workspace/files/files.html

Serving Endpoints https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving/serving_endpoints.html

Vector Search https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_indexes.html

Pipelines https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html

Secrets https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/secrets.html

DBUtils https://databricks-sdk-py.readthedocs.io/en/latest/dbutils.html

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

python-dev

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

skill-test

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

databricks-config

No summary provided by upstream source.

Repository SourceNeeds Review