Databricks TL;DR: Delta Sharing
Back to Databricks main page

Delta Sharing

/tldr: Securely sharing data across clouds, platforms, and organizations with zero copies.

Data Collaboration Open Standard Zero Copy Ecosystem

1. What is Delta Sharing?

Delta Sharing is an open protocol developed by Databricks for secure data sharing. Its primary goal is to allow data providers to share data with consumers, regardless of which cloud platform (AWS, Azure, GCP) or compute platform (Databricks, Snowflake, Redshift, etc.) they use.

The "No-Copy" Principle

Unlike traditional data sharing methods that require creating and moving data copies (which increases storage costs and staleness), Delta Sharing enables **direct, live access** to the source data in the provider's cloud storage.

  • **Zero ETL:** Eliminates the need for Extract, Transform, Load processes just for sharing.
  • **Freshness:** Consumers always see the latest data updates from the provider's Delta Table.
  • **Security:** All access is governed by the provider's cloud storage security policies and Delta Sharing authentication.

2. Sharing Components and Workflow

Share

A named collection of tables, schemas, or notebooks that the provider wishes to share.

CREATE SHARE eu_sales_share; ALTER SHARE eu_sales_share ADD TABLE sales_db.europe_data;

Recipient

Represents the consumer organization. It contains the credentials used to access the shared data.

CREATE RECIPIENT analytics_corp; -- Generates a credentials activation link

Grant

The explicit link between a Share and a Recipient, activating access.

GRANT SELECT ON SHARE eu_sales_share TO RECIPIENT analytics_corp;

The Recipient downloads a **credentials file** (a JSON file) which contains the endpoint and token necessary to authenticate with the provider's sharing server.

3. Accessing Shared Data on the Consumer Side

Because Delta Sharing is an open standard, any system with a Delta Sharing connector can consume the data.

A. Python / Pandas Integration

Data scientists often use the **`delta-sharing` Python client** to query the shared data directly into a Pandas DataFrame.

Python Example (using Pandas)

# 1. Install the client # pip install delta-sharing # 2. Import and create the client, using the downloaded credentials file from delta_sharing import SharingClient profile_file = "/path/to/analytics_corp.share" client = SharingClient(profile_file) # 3. List the shared tables client.list_all_tables() # 4. Load a shared table directly into a Pandas DataFrame data_url = f"{profile_file}#sales_db.europe_data" df = delta_sharing.load_as_pandas(data_url) print(df.head())

B. Power BI and BI Tools

Power BI, Tableau, and other popular BI tools have native connectors. The user simply provides the URL from the credentials file, and the tool can treat the shared tables as a live data source.

  • **Integration:** Uses the credentials file (`.share`) to establish a connection.
  • **Live Querying:** Power BI can use DirectQuery mode, querying the shared data on demand without importing a copy.

C. Cross-Cloud & Cross-Platform

The consumer is not limited to Databricks. They can use the open protocol client in Spark, cloud data warehouses, or local applications, achieving true interoperability.

Delta Sharing makes data sharing an API call, not a data transfer.

Databricks Fundamentals Series: Delta Sharing