Databricks TL;DR: Secrets and Key Management
Back to Databricks main page

Secrets and Key Management

/tldr: Using `dbutils.secrets` and Secret Scopes for secure credential management.

Security dbutils Key Vault / KMS Scopes

1. Secret Scopes: The Core Mechanism

A **Secret Scope** is the primary organizational unit for secrets in Databricks. It defines a namespace where secrets are stored. You access a secret using a combination of the scope name and the secret key.

Types of Secret Scopes

Databricks-Backed Scope

Secrets are encrypted and stored within the Databricks control plane. This is typically easier for quick setup or non-critical environments.

  • **Creation:** Use the Databricks Secrets CLI or API.
  • **Security:** Encryption keys are managed by Databricks.

Azure Key Vault / AWS KMS-Backed Scope

Secrets are stored in an external cloud vault, and Databricks simply provides a reference (a pointer) to them. This is the **standard for production security and compliance.**

  • **Creation:** Requires specifying the Cloud DNS/Resource ID of the external vault.
  • **Security:** Encryption keys and access policies are fully controlled by the cloud provider/user.

Access Control (ACLs)

Access to a Secret Scope is managed via ACLs (Access Control Lists), granting users or groups READ, WRITE, or MANAGE permissions.

  • READ: Required to retrieve secrets via dbutils.secrets.get().
  • WRITE: Required to add or modify secrets in a Databricks-backed scope.
  • MANAGE: Required to change ACLs on the scope.

2. `dbutils.secrets`: Accessing Credentials

The only way to access secrets from a Databricks notebook is through the built-in dbutils.secrets utility.

Getting a Secret Value

You need the scope name and the secret key to retrieve the credential.


# Python Example
db_password = dbutils.secrets.get(scope="analytics_scope", key="postgres_password")

# Scala Example
val dbPassword = dbutils.secrets.get("analytics_scope", "postgres_password")
                    

Listing Available Secrets/Scopes

Use these methods for discovery and debugging.


# List all scopes you have access to
dbutils.secrets.listScopes()

# List all secrets within a specific scope
dbutils.secrets.list("analytics_scope")
                    

**Security Feature:** When you display a secret (e.g., printing db_password), Databricks automatically redacts the output in the notebook, replacing the value with [REDACTED].

3. Cloud-Backed Key Vaults

Integrating with cloud vaults is the best practice. Databricks assumes the role of an Identity (e.g., a Service Principal or IAM Role) to securely fetch the secret from the vault on behalf of the user.

Azure Key Vault (AKV) Integration

  • **Prerequisite:** A Service Principal (SPN) with **Get** and **List** permissions on the AKV Secrets.
  • **Mapping:** When creating the scope, you provide the AKV's DNS Name and Resource ID.
  • **Access:** A secret named `storage-key` in AKV is accessed in Databricks as dbutils.secrets.get(scope="akv_scope", key="storage-key").

AWS Key Management Service (KMS) Integration

  • **Prerequisite:** An IAM Role that Databricks assumes, with permissions to the Secrets Manager secrets.
  • **Mapping:** You link the scope to an AWS Secrets Manager ARN.
  • **Access:** Similar to Azure, the secret name in Secrets Manager is the key name in the Databricks scope.

Never hardcode credentials. Always use cloud-backed Secret Scopes for production workloads.

Databricks Fundamentals Series: Secrets and Key Management