Delta Sharing
/tldr: Securely sharing data across clouds, platforms, and organizations with zero copies.
1. What is Delta Sharing?
Delta Sharing is an open protocol developed by Databricks for secure data sharing. Its primary goal is to allow data providers to share data with consumers, regardless of which cloud platform (AWS, Azure, GCP) or compute platform (Databricks, Snowflake, Redshift, etc.) they use.
The "No-Copy" Principle
Unlike traditional data sharing methods that require creating and moving data copies (which increases storage costs and staleness), Delta Sharing enables **direct, live access** to the source data in the provider's cloud storage.
- **Zero ETL:** Eliminates the need for Extract, Transform, Load processes just for sharing.
- **Freshness:** Consumers always see the latest data updates from the provider's Delta Table.
- **Security:** All access is governed by the provider's cloud storage security policies and Delta Sharing authentication.
2. Sharing Components and Workflow
Share
A named collection of tables, schemas, or notebooks that the provider wishes to share.
CREATE SHARE eu_sales_share;
ALTER SHARE eu_sales_share ADD TABLE sales_db.europe_data;
Recipient
Represents the consumer organization. It contains the credentials used to access the shared data.
CREATE RECIPIENT analytics_corp;
-- Generates a credentials activation link
Grant
The explicit link between a Share and a Recipient, activating access.
GRANT SELECT ON SHARE eu_sales_share TO RECIPIENT analytics_corp;
The Recipient downloads a **credentials file** (a JSON file) which contains the endpoint and token necessary to authenticate with the provider's sharing server.
3. Accessing Shared Data on the Consumer Side
Because Delta Sharing is an open standard, any system with a Delta Sharing connector can consume the data.
A. Python / Pandas Integration
Data scientists often use the **`delta-sharing` Python client** to query the shared data directly into a Pandas DataFrame.
Python Example (using Pandas)
# 1. Install the client
# pip install delta-sharing
# 2. Import and create the client, using the downloaded credentials file
from delta_sharing import SharingClient
profile_file = "/path/to/analytics_corp.share"
client = SharingClient(profile_file)
# 3. List the shared tables
client.list_all_tables()
# 4. Load a shared table directly into a Pandas DataFrame
data_url = f"{profile_file}#sales_db.europe_data"
df = delta_sharing.load_as_pandas(data_url)
print(df.head())
B. Power BI and BI Tools
Power BI, Tableau, and other popular BI tools have native connectors. The user simply provides the URL from the credentials file, and the tool can treat the shared tables as a live data source.
- **Integration:** Uses the credentials file (`.share`) to establish a connection.
- **Live Querying:** Power BI can use DirectQuery mode, querying the shared data on demand without importing a copy.
C. Cross-Cloud & Cross-Platform
The consumer is not limited to Databricks. They can use the open protocol client in Spark, cloud data warehouses, or local applications, achieving true interoperability.
Delta Sharing makes data sharing an API call, not a data transfer.