Unity Catalog (UC)
/tldr: A unified governance solution for data, analytics, and AI, providing a single source of truth for all metadata and access control.
1. Core Architecture: The 3-Level Namespace
Unity Catalog standardizes data organization across all workspaces and compute resources using a clear, SQL-compliant hierarchy. This structure ensures consistent naming and access control.
<catalog>.<schema>.<table>
1. Catalog (Top Level)
Often used to define the highest-level grouping, such as environment (dev, prod), data domain (finance, marketing), or business unit. Catalogs are attached to the Unity Catalog Metastore.
2. Schema (Database)
The second level, equivalent to traditional databases. Used to group related data assets within a Catalog, e.g., raw, staging, or gold tables.
3. Table (Data Asset)
The final level, representing the actual data asset (table, view, or volume). Data is queried using the fully qualified name: prod.gold.customer_data.
2. Centralized Governance & Security
UC enforces security centrally using ANSI SQL standard GRANT statements, replacing previous table access control lists (ACLs) that were tied to individual workspaces.
Identity and Access Management
Access is managed against a centralized set of identities (users, service principals, and groups) synced from your identity provider (e.g., Azure AD, Okta).
Standard SQL Grants
Permissions are granted at the Catalog, Schema, Table, or even Column/Row level (using Dynamic Views) and persist across all workspaces and compute resources.
GRANT SELECT ON TABLE prod.gold.orders TO `data-analysts`
3. Automated Lineage and Discovery
Automated Lineage Capture
UC automatically captures data lineage across queries run in any language (SQL, Python, Scala, R) within Databricks. This shows how data moves from source tables to final tables, including notebook/workflow names.
Discovery (Search & Sharing)
The centralized metastore enables powerful data discovery. Users can search for data assets across the entire organization using tags, names, and descriptions.
4. Data Sharing via Delta Sharing
Unity Catalog is the management layer for Delta Sharing, an open standard for secure, cross-platform data sharing.
Key Features
- **Open Protocol:** Data can be shared with users running any platform (AWS, Azure, GCP, Tableau, PowerBI, etc.), not just Databricks.
- **No Replication:** Data remains in the provider's cloud storage. The recipient accesses it via a simple API call.
- **Governance:** UC manages the shares, recipients, and tokens used for accessing the shared data.
Unity Catalog is the central brain for security, governance, and discovery across your entire Lakehouse.