Databricks Lakeflow Jobs
Overview
Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles.
Reference Files
Use Case Reference File
Configure task types (notebook, Python, SQL, dbt, etc.) task-types.md
Set up triggers and schedules triggers-schedules.md
Configure notifications and health monitoring notifications-monitoring.md
Complete working examples examples.md
Quick Start
Python SDK
from databricks.sdk import WorkspaceClient from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create( name="my-etl-job", tasks=[ Task( task_key="extract", notebook_task=NotebookTask( notebook_path="/Workspace/Users/user@example.com/extract", source=Source.WORKSPACE ) ) ] ) print(f"Created job: {job.job_id}")
CLI
databricks jobs create --json '{ "name": "my-etl-job", "tasks": [{ "task_key": "extract", "notebook_task": { "notebook_path": "/Workspace/Users/user@example.com/extract", "source": "WORKSPACE" } }] }'
Asset Bundles (DABs)
resources/jobs.yml
resources: jobs: my_etl_job: name: "[${bundle.target}] My ETL Job" tasks: - task_key: extract notebook_task: notebook_path: ../src/notebooks/extract.py
Core Concepts
Multi-Task Workflows
Jobs support DAG-based task dependencies:
tasks:
-
task_key: extract notebook_task: notebook_path: ../src/extract.py
-
task_key: transform depends_on:
- task_key: extract notebook_task: notebook_path: ../src/transform.py
-
task_key: load depends_on:
- task_key: transform run_if: ALL_SUCCESS # Only run if all dependencies succeed notebook_task: notebook_path: ../src/load.py
run_if conditions:
-
ALL_SUCCESS (default) - Run when all dependencies succeed
-
ALL_DONE
-
Run when all dependencies complete (success or failure)
-
AT_LEAST_ONE_SUCCESS
-
Run when at least one dependency succeeds
-
NONE_FAILED
-
Run when no dependencies failed
-
ALL_FAILED
-
Run when all dependencies failed
-
AT_LEAST_ONE_FAILED
-
Run when at least one dependency failed
Task Types Summary
Task Type Use Case Reference
notebook_task
Run notebooks task-types.md#notebook-task
spark_python_task
Run Python scripts task-types.md#spark-python-task
python_wheel_task
Run Python wheels task-types.md#python-wheel-task
sql_task
Run SQL queries/files task-types.md#sql-task
dbt_task
Run dbt projects task-types.md#dbt-task
pipeline_task
Trigger DLT/SDP pipelines task-types.md#pipeline-task
spark_jar_task
Run Spark JARs task-types.md#spark-jar-task
run_job_task
Trigger other jobs task-types.md#run-job-task
for_each_task
Loop over inputs task-types.md#for-each-task
Trigger Types Summary
Trigger Type Use Case Reference
schedule
Cron-based scheduling triggers-schedules.md#cron-schedule
trigger.periodic
Interval-based triggers-schedules.md#periodic-trigger
trigger.file_arrival
File arrival events triggers-schedules.md#file-arrival-trigger
trigger.table_update
Table change events triggers-schedules.md#table-update-trigger
continuous
Always-running jobs triggers-schedules.md#continuous-jobs
Compute Configuration
Job Clusters (Recommended)
Define reusable cluster configurations:
job_clusters:
- job_cluster_key: shared_cluster new_cluster: spark_version: "15.4.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 spark_conf: spark.speculation: "true"
tasks:
- task_key: my_task job_cluster_key: shared_cluster notebook_task: notebook_path: ../src/notebook.py
Autoscaling Clusters
new_cluster: spark_version: "15.4.x-scala2.12" node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 8
Existing Cluster
tasks:
- task_key: my_task existing_cluster_id: "0123-456789-abcdef12" notebook_task: notebook_path: ../src/notebook.py
Serverless Compute
For notebook and Python tasks, omit cluster configuration to use serverless:
tasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
No cluster config = serverless
Job Parameters
Define Parameters
parameters:
- name: env default: "dev"
- name: date default: "{{start_date}}" # Dynamic value reference
Access in Notebook
In notebook
dbutils.widgets.get("env") dbutils.widgets.get("date")
Pass to Tasks
tasks:
- task_key: my_task notebook_task: notebook_path: ../src/notebook.py base_parameters: env: "{{job.parameters.env}}" custom_param: "value"
Common Operations
Python SDK Operations
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
List jobs
jobs = w.jobs.list()
Get job details
job = w.jobs.get(job_id=12345)
Run job now
run = w.jobs.run_now(job_id=12345)
Run with parameters
run = w.jobs.run_now( job_id=12345, job_parameters={"env": "prod", "date": "2024-01-15"} )
Cancel run
w.jobs.cancel_run(run_id=run.run_id)
Delete job
w.jobs.delete(job_id=12345)
CLI Operations
List jobs
databricks jobs list
Get job details
databricks jobs get 12345
Run job
databricks jobs run-now 12345
Run with parameters
databricks jobs run-now 12345 --job-params '{"env": "prod"}'
Cancel run
databricks jobs cancel-run 67890
Delete job
databricks jobs delete 12345
Asset Bundle Operations
Validate configuration
databricks bundle validate
Deploy job
databricks bundle deploy
Run job
databricks bundle run my_job_resource_key
Deploy to specific target
databricks bundle deploy -t prod
Destroy resources
databricks bundle destroy
Permissions (DABs)
resources: jobs: my_job: name: "My Job" permissions: - level: CAN_VIEW group_name: "data-analysts" - level: CAN_MANAGE_RUN group_name: "data-engineers" - level: CAN_MANAGE user_name: "admin@example.com"
Permission levels:
-
CAN_VIEW
-
View job and run history
-
CAN_MANAGE_RUN
-
View, trigger, and cancel runs
-
CAN_MANAGE
-
Full control including edit and delete
Common Issues
Issue Solution
Job cluster startup slow Use job clusters with job_cluster_key for reuse across tasks
Task dependencies not working Verify task_key references match exactly in depends_on
Schedule not triggering Check pause_status: UNPAUSED and valid timezone
File arrival not detecting Ensure path has proper permissions and uses cloud storage URL
Table update trigger missing events Verify Unity Catalog table and proper grants
Parameter not accessible Use dbutils.widgets.get() in notebooks
"admins" group error Cannot modify admins permissions on jobs
Serverless task fails Ensure task type supports serverless (notebook, Python)
Related Skills
-
databricks-asset-bundles - Deploy jobs via Databricks Asset Bundles
-
databricks-spark-declarative-pipelines - Configure pipelines triggered by jobs
Resources
-
Jobs API Reference
-
Jobs Documentation
-
DABs Job Task Types
-
Bundle Examples Repository