Deploying Azure Databricks with Private Endpoints using Terraform

Why Private Endpoints for Databricks?

If you're running Databricks in an enterprise environment, especially in government or financial services, you'll quickly hit this requirement: no public network access. The default Databricks deployment exposes the workspace, storage accounts, and data plane to the public internet. For regulated industries, that's a non-starter.

The solution is a combination of VNet injection (Databricks clusters run inside your own virtual network) and private endpoints (all data plane access goes through private IPs). This gives you complete network isolation while maintaining full Databricks functionality.

The challenge? Getting all the pieces right. Subnet delegations, NSG associations, private DNS zones, storage account firewall rules, access connector identities, and the Databricks workspace itself all need to be configured in the correct order with the correct dependencies. Miss one piece and the deployment fails with cryptic error messages.

This Terraform configuration handles all of it, across multiple environments, with a single variable change.

Architecture Overview

┌──────────────────────────────────────────────────────────┐
│                   Your Azure Subscription                 │
│                                                           │
│  ┌─────────────────────── VNet ────────────────────────┐ │
│  │                                                      │ │
│  │  ┌────────────┐  ┌────────────┐  ┌──────────────┐  │ │
│  │  │ Databricks │  │ Databricks │  │   Private    │  │ │
│  │  │  Public    │  │  Private   │  │  Endpoint    │  │ │
│  │  │  Subnet    │  │  Subnet    │  │  Subnet      │  │ │
│  │  │ (delegated)│  │ (delegated)│  │ (no deleg.)  │  │ │
│  │  └─────┬──────┘  └─────┬──────┘  └──────┬───────┘  │ │
│  │        └───────┬───────┘                 │          │ │
│  │                │                         │          │ │
│  │     ┌──────────▼──────────┐     Private Endpoints:  │ │
│  │     │  Databricks         │     • STG1 Blob + DFS   │ │
│  │     │  Workspace          │     • STG2 Blob + DFS   │ │
│  │     │  (Premium, No PIP)  │     • Databricks UI/API │ │
│  │     └──────────┬──────────┘              │          │ │
│  │                │                         │          │ │
│  │    ┌───────────┼───────────┐             │          │ │
│  │    ▼                       ▼             │          │ │
│  │ ┌──────┐              ┌──────┐           │          │ │
│  │ │ STG1 │ ◄────────────│ STG2 │ ◄─────────┘          │ │
│  │ │(Data)│  Access      │ (UC) │  Access              │ │
│  │ │      │  Connector   │      │  Connector           │ │
│  │ └──────┘  (MI)        └──────┘  (MI)                │ │
│  └──────────────────────────────────────────────────────┘ │
│                                                           │
│  Private DNS Zones:                                       │
│  • privatelink.blob.core.windows.net                      │
│  • privatelink.dfs.core.windows.net                       │
│  • privatelink.azuredatabricks.net                        │
└───────────────────────────────────────────────────────────┘

What Gets Deployed

A single terraform apply creates all of the following:

Databricks Workspace (Premium SKU) with VNet injection, no public IP, and public access disabled
Two Storage Accounts with HNS enabled (Data Lake Gen2), public access denied, and default-deny network rules
Six Storage Containers for data ingestion, Unity Catalog, and environment-specific workloads
Two Access Connectors with System-Assigned Managed Identities for secure, keyless storage access
Five Private Endpoints (2x Blob, 2x DFS, 1x Databricks UI/API)
Private DNS Zone integration for automatic name resolution
Network Security Group associated with both Databricks subnets
Eight IAM Role Assignments (Storage Blob + Queue Data Contributor for connectors and admin group)

Prerequisites

Azure Infrastructure (must exist before deployment)

Resource	Naming Pattern	Purpose
Virtual Network	{prefix}-{env}-cc-vnet-01	Network boundary
Public Subnet	{prefix}-{env}-databricks-public-snet-01	Databricks public nodes
Private Subnet	{prefix}-{env}-databricks-private-snet-01	Databricks private nodes
PEP Subnet	{prefix}-{env}-pep-snet-01	Private endpoints
Private DNS Zones	In a central/identity subscription	Name resolution
Terraform State Storage	Any storage account	State file management

Permissions

Your service principal needs Contributor on the target subscription and Network Contributor on the VNet resource group. For cross-subscription DNS zones, it also needs Reader on the identity/DNS subscription.

Project Structure

File Tree

databricks-private-endpoints/
├── provider.tf                 # Azure provider + backend config
├── variables.tf                # Environment variable + all naming locals
├── resource_group.tf           # Resource group
├── network.tf                  # VNet/subnet data sources
├── databricks.tf               # Workspace + access connectors
├── storage.tf                  # Storage accounts, containers, IAM
├── private_endpoint.tf         # Private endpoints + DNS zone refs
├── nsg.tf                      # Network security groups
├── deploy.ps1                  # Automated deployment script
├── azure-pipelines.yml         # CI/CD pipeline (Azure DevOps)
├── azure-auth.env.example      # Credential template
├── terraform.tfvars.example    # Variable template
└── backend-configs/            # Per-environment state file configs
    ├── dev.tfbackend
    ├── test.tfbackend
    ├── stage.tfbackend
    ├── analytics.tfbackend
    └── poc.tfbackend

Core Terraform Walkthrough

Dynamic Naming with Locals

The entire configuration is driven by a single environment variable. Every resource name is generated dynamically in the locals block:

HCL

variable "environment" {
  type    = string
  default = "test"
  validation {
    condition     = contains(["dev", "stage", "analytics", "poc", "test"], var.environment)
    error_message = "Environment must be one of: dev, stage, analytics, poc, test."
  }
}

locals {
  resource_group_name   = "contoso-${var.environment}-databricks-rg-01"
  databricks_workspace  = "contoso-${var.environment}-databricks-wks-01"

  # Storage accounts have a 24-char limit, so abbreviate long env names
  env_abbr = var.environment == "analytics" ? "anltcs" : var.environment
  stg1_name = "contoso${local.env_abbr}ccdbwingstg001"
  stg2_name = "contoso${local.env_abbr}ccdbwucstg001"
}

💡 Storage Account Name Limits

Azure storage accounts must be 3-24 characters, lowercase and numbers only. The "analytics" environment would exceed this, so we abbreviate it to "anltcs". This is handled automatically in variables.tf.

The Databricks Workspace

The workspace is the centerpiece. Key settings: Premium SKU for Unity Catalog support, VNet injection with no public IP, and public network access disabled:

HCL

resource "azurerm_databricks_workspace" "main" {
  name                = local.databricks_workspace_name
  resource_group_name = local.resource_group_name
  location            = var.location
  sku                 = "premium"

  public_network_access_enabled         = false
  network_security_group_rules_required = "NoAzureDatabricksRules"

  custom_parameters {
    no_public_ip        = true
    virtual_network_id  = data.azurerm_virtual_network.main.id
    public_subnet_name  = data.azurerm_subnet.databricks_public.name
    private_subnet_name = data.azurerm_subnet.databricks_private.name
    public_subnet_network_security_group_association_id  = azurerm_subnet_network_security_group_association.databricks_public_nsg_association.id
    private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.databricks_private_nsg_association.id
  }
}

Storage with Managed Identity Access

Instead of using storage account keys (a security anti-pattern), we use Databricks Access Connectors with System-Assigned Managed Identities. Each connector gets Storage Blob Data Contributor and Storage Queue Data Contributor roles on its respective storage account:

HCL

resource "azurerm_databricks_access_connector" "unity_catalog" {
  name                = local.access_connector_unity_catalog_name
  resource_group_name = local.resource_group_name
  location            = var.location
  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_role_assignment" "unity_blob" {
  scope                = azurerm_storage_account.stg2.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = azurerm_databricks_access_connector.unity_catalog.identity[0].principal_id
}

Private Endpoints

Five private endpoints cover all data plane access. Each one connects to a private DNS zone for automatic name resolution:

HCL

resource "azurerm_private_endpoint" "databricks_workspace" {
  name                = "contoso-${var.environment}-pep-databricks-workspace-01"
  location            = var.location
  resource_group_name = local.resource_group_name
  subnet_id           = data.azurerm_subnet.pep.id

  private_service_connection {
    name                           = "databricks-workspace-connection"
    private_connection_resource_id = azurerm_databricks_workspace.main.id
    subresource_names              = ["databricks_ui_api"]
    is_manual_connection           = false
  }

  private_dns_zone_group {
    name                 = "databricks-zone-group"
    private_dns_zone_ids = [data.azurerm_private_dns_zone.databricks_workspace.id]
  }
}

The Deployment Script

The deploy.ps1 PowerShell script is where the magic happens. It automates the entire deployment workflow, including a critical feature: automatic subnet delegation management.

Why Subnet Delegations Matter

Azure Databricks with VNet injection has strict subnet requirements:

Databricks subnets (public and private) MUST be delegated to Microsoft.Databricks/workspaces
Private Endpoint subnet MUST NOT have any delegation

Get this wrong and you'll see errors like "required public subnet delegation not found" or "PrivateEndpointCreationNotAllowedAsSubnetIsDelegated." The deployment script checks all three subnets before every plan or apply and fixes them automatically.

PowerShell

# Deploy to any environment with a single command
.\deploy.ps1 -Environment dev -Action init
.\deploy.ps1 -Environment dev -Action plan
.\deploy.ps1 -Environment dev -Action apply

# The script automatically:
# 1. Loads credentials from azure-auth.env
# 2. Selects the correct subscription
# 3. Authenticates with Azure
# 4. Checks and fixes subnet delegations
# 5. Runs the Terraform command
# 6. Reports results

Multi-Environment Support

Each environment is completely isolated:

Separate subscriptions per environment (configured in azure-auth.env)
Separate state files in Azure Storage (dev.tfstate, test.tfstate, etc.)
Separate backend configs in the backend-configs/ folder
Dynamic resource naming prevents any cross-environment conflicts

Switching environments is a single variable change. The deploy script handles subscription switching, state file routing, and credential management automatically.

CI/CD Pipeline

The included azure-pipelines.yml defines a multi-stage pipeline: Build/Validate, then deploy to Dev, Test, UAT, and Production in sequence. Each stage uses environment-specific backend configs and requires approval gates for production.

Troubleshooting Guide

The most common issues you'll encounter, and how to fix them:

Network Intent Policy Errors (During Destroy)

When you destroy a Databricks workspace, Azure leaves behind Network Intent Policies (NIPs) on the subnets. These block NSG removal. The fix: delete the workspace first, wait 5-10 minutes, delete the NIPs manually, then retry destroy.

Failed Workspace State

If a deployment fails midway, the workspace can get stuck in a "Failed" provisioning state. Delete it via Azure CLI, clean the Terraform state, then redeploy. The full troubleshooting guide is included in the repo.

Get the Code

📦 Full Source Code on GitHub

All Terraform files, deployment script, CI/CD pipeline, and troubleshooting guide included.

View on GitHub →

Clone the repo, replace contoso with your organization's prefix in variables.tf, configure your credentials in azure-auth.env, and you're ready to deploy. The entire setup takes about 15 minutes once your networking prerequisites are in place.

Deploying Azure Databricks with Private Endpoints Using Terraform

Table of Contents

Why Private Endpoints for Databricks?

Architecture Overview

What Gets Deployed

Prerequisites

Azure Infrastructure (must exist before deployment)

Permissions

Project Structure

Core Terraform Walkthrough

Dynamic Naming with Locals

The Databricks Workspace

Storage with Managed Identity Access

Private Endpoints

The Deployment Script

Why Subnet Delegations Matter

Multi-Environment Support

CI/CD Pipeline

Troubleshooting Guide

Network Intent Policy Errors (During Destroy)

Failed Workspace State

Get the Code