Automating Azure Databricks with Databricks SDK (Python) – Step by Step Guide

January 10, 2026

Automating Azure Databricks with Databricks SDK (Python) – Step by Step Guide

When working with Azure Databricks, creating clusters, jobs, repos, and secrets manually from the UI does not scale. For real-world data engineering projects, automation is essential.

In this post, I explain how to automate Azure Databricks resource creation using the Databricks SDK for Python, step by step, in a clear and practical way.

Why use Databricks SDK?

Infrastructure as Code (IaC) for Databricks
Repeatable and consistent environments
Easy CI/CD integration
Less manual UI work, fewer errors

Prerequisites

An existing Azure Databricks workspace
Permission to create clusters, jobs, repos, and secrets
Python 3.9 or above
Databricks Personal Access Token (PAT)

Step 1: Create Python Virtual Environment

python -m venv .venv
source .venv/bin/activate   # macOS / Linux

Step 2: Install Databricks SDK

pip install databricks-sdk

Step 3: Configure Authentication

Set the Databricks workspace URL and PAT token as environment variables.

export DATABRICKS_HOST="https://adb-<workspace-id>.<region>.azuredatabricks.net"
export DATABRICKS_TOKEN="<your_personal_access_token>"

This allows the SDK to authenticate securely without hardcoding credentials.

Step 4: Test Connection to Databricks

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
me = w.current_user.me()
print("Connected as:", me.user_name)

If this runs successfully, your SDK connection is working.

Step 5: Create a Databricks Cluster using SDK

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

cluster = w.clusters.create(
    cluster_name="demo-sdk-cluster",
    spark_version="13.3.x-scala2.12",
    node_type_id="Standard_DS3_v2",
    num_workers=1,
    autotermination_minutes=20
)

print("Cluster ID:", cluster.cluster_id)

This creates a cluster programmatically instead of using the Databricks UI.

Step 6: Create a Databricks Job

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, NewCluster

w = WorkspaceClient()

job = w.jobs.create(
    name="demo-sdk-job",
    tasks=[
        Task(
            task_key="run_notebook",
            notebook_task=NotebookTask(
                notebook_path="/Shared/demo_notebook"
            ),
            new_cluster=NewCluster(
                spark_version="13.3.x-scala2.12",
                node_type_id="Standard_DS3_v2",
                num_workers=1
            )
        )
    ]
)

print("Job ID:", job.job_id)

This is the recommended way to run production workloads using job clusters.

Step 7: Create Databricks Repo (Git Integration)

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

repo = w.repos.create(
    url="https://github.com/your-org/your-repo.git",
    provider="gitHub",
    path="/Repos/demo-sdk-repo"
)

print("Repo created:", repo.id)

This enables Git-based development directly inside Databricks.

Step 8: Create Secret Scope and Secrets

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.secrets.create_scope(scope="demo-scope")
w.secrets.put_secret(
    scope="demo-scope",
    key="storage_key",
    string_value="my-secret-value"
)

Secrets should always be stored securely instead of being hardcoded.

Best Practices

Use job clusters instead of all-purpose clusters
Store configuration in YAML/JSON files
Make scripts idempotent (check before create)
Integrate SDK scripts into CI/CD pipelines

Conclusion

The Databricks SDK allows data engineers to fully automate workspace setup, making environments reproducible, secure, and production-ready.

If you are serious about Databricks in enterprise projects, SDK-based automation is not optional — it is essential.

Happy Automating 🚀

Search This Blog

Data Engineer