Databricks TL;DR: Lakehouse Federation
Back to Databricks main page

Databricks Lakehouse Federation

/tldr: A capability of Unity Catalog to query data in external databases (e.g., MySQL, Snowflake) without importing or replicating the data.

Zero ETL Unified Governance Live Query Access Cross-Platform Joins

1. The Need for Federation (Zero ETL)

Historically, accessing data in external systems required complex ETL/ELT pipelines to move the data into the Lakehouse. Lakehouse Federation eliminates this by allowing Databricks to execute queries directly against the source system.

How it Works

Query Pushdown

Databricks translates the SQL query (or Spark code) into the dialect of the external database, sends only the necessary parts of the query to the source, and retrieves only the results. This maximizes performance and minimizes network transfer.

Live Data Access

Because the query runs directly on the source, you are always accessing the most up-to-date, live data without delay or synchronization issues common with replication.

2. Integration via Unity Catalog

The connection is managed through Unity Catalog, leveraging the familiar 3-level namespace structure and centralized governance.

1. Connection Object

The administrator defines a Connection in Unity Catalog, which stores the necessary credentials and network information (hostname, port, etc.) for the external system.

2. Foreign Catalog

A Foreign Catalog is created, referencing the Connection. This makes the entire external database appear as a top-level Catalog within the Databricks environment.

3. Seamless Querying

Users query the external data just like any Delta table, using the 3-level namespace: <foreign_catalog>.<schema>.<table> (e.g., mysql_db.ecommerce.users).

                
-- Example: Joining Delta Lake (local) data with Federated (MySQL) data
SELECT 
    dlt.customer_id,
    fed.order_date,
    dlt.region
FROM 
    prod.gold.customer_data dlt
INNER JOIN 
    mysql_db.ecommerce.orders fed
ON 
    dlt.customer_id = fed.customer_id;
                
            

3. Supported External Sources

Lakehouse Federation offers native connectors for most popular data platforms, ensuring optimal performance and compatibility.

MySQL

Transactional DBs

PostgreSQL

Open Source DBs

Snowflake

Cloud Data Warehouses

SQL Server

Enterprise DBs

Lakehouse Federation turns your Lakehouse into a unified, single point of access for all enterprise data.

Databricks Fundamentals Series: Lakehouse Federation