Databricks Lakehouse Federation
/tldr: A capability of Unity Catalog to query data in external databases (e.g., MySQL, Snowflake) without importing or replicating the data.
1. The Need for Federation (Zero ETL)
Historically, accessing data in external systems required complex ETL/ELT pipelines to move the data into the Lakehouse. Lakehouse Federation eliminates this by allowing Databricks to execute queries directly against the source system.
How it Works
Query Pushdown
Databricks translates the SQL query (or Spark code) into the dialect of the external database, sends only the necessary parts of the query to the source, and retrieves only the results. This maximizes performance and minimizes network transfer.
Live Data Access
Because the query runs directly on the source, you are always accessing the most up-to-date, live data without delay or synchronization issues common with replication.
2. Integration via Unity Catalog
The connection is managed through Unity Catalog, leveraging the familiar 3-level namespace structure and centralized governance.
1. Connection Object
The administrator defines a Connection in Unity Catalog, which stores the necessary credentials and network information (hostname, port, etc.) for the external system.
2. Foreign Catalog
A Foreign Catalog is created, referencing the Connection. This makes the entire external database appear as a top-level Catalog within the Databricks environment.
3. Seamless Querying
Users query the external data just like any Delta table, using the 3-level namespace: <foreign_catalog>.<schema>.<table> (e.g., mysql_db.ecommerce.users).
-- Example: Joining Delta Lake (local) data with Federated (MySQL) data
SELECT
dlt.customer_id,
fed.order_date,
dlt.region
FROM
prod.gold.customer_data dlt
INNER JOIN
mysql_db.ecommerce.orders fed
ON
dlt.customer_id = fed.customer_id;
3. Supported External Sources
Lakehouse Federation offers native connectors for most popular data platforms, ensuring optimal performance and compatibility.
MySQL
Transactional DBs
PostgreSQL
Open Source DBs
Snowflake
Cloud Data Warehouses
SQL Server
Enterprise DBs
Lakehouse Federation turns your Lakehouse into a unified, single point of access for all enterprise data.