Principle:Eventual Inc Daft Catalog Registration
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Catalog |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Catalog registration is the technique for connecting external data catalogs to a Daft session, enabling unified data access across multiple catalog systems.
Description
Catalog registration attaches external catalog systems (such as Apache Iceberg, Unity Catalog, and Gravitino) to a Daft session, enabling SQL queries and DataFrame operations against cataloged tables. Once a catalog is attached to a session, its tables can be referenced by name through the session's unified namespace. The session maintains a registry of all attached catalogs and resolves table references by searching through them. This allows users to work with tables from different catalog backends using a single, consistent API without needing to manage individual catalog connections separately.
Usage
Use catalog registration when you need to connect to one or more external data catalogs (e.g., Iceberg, Unity, Gravitino) and query their tables through Daft's session interface. This is the entry point for any workflow that involves discovering and accessing tables managed by external metadata stores.
Theoretical Basis
Catalog registration follows the catalog federation pattern, where multiple external metadata stores are unified under a single session namespace. This pattern is common in data lake architectures where data is spread across multiple catalog systems.
The general workflow is:
1. Create or obtain a catalog instance (e.g., Iceberg catalog, Unity catalog)
2. Attach the catalog to a Daft session with an optional alias
3. The session wraps non-Daft catalogs into a unified Catalog interface
4. Tables from the attached catalog become accessible by name
5. SQL queries and DataFrame operations resolve table names through the session's catalog registry
This design decouples the data access layer from the specific catalog implementation, allowing Daft to support new catalog backends without changing the query interface.