Principle:Eventual Inc Daft Iceberg Catalog Creation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Catalog |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Iceberg catalog creation is the technique for wrapping a PyIceberg catalog object into a Daft Catalog, enabling table discovery and access through Daft's unified catalog interface.
Description
Iceberg catalog creation takes an existing PyIceberg catalog instance and adapts it into Daft's internal Catalog representation. This allows users who already have a configured PyIceberg catalog (connected to a metastore such as Hive, Glue, or REST) to use it seamlessly within Daft for table discovery, schema inspection, and data access. The resulting Daft Catalog delegates all metadata operations to the underlying PyIceberg catalog while presenting a consistent interface that integrates with Daft sessions and query planning.
Usage
Use Iceberg catalog creation when you have a PyIceberg catalog instance (e.g., from pyiceberg.catalog.load_catalog()) and want to make its tables available within Daft. This is typically done before attaching the catalog to a session or when you need to browse and read Iceberg tables programmatically.
Theoretical Basis
This technique follows the adapter pattern, which wraps an external implementation behind a unified interface. The adapter translates between the PyIceberg catalog API and Daft's internal Catalog interface, ensuring that:
1. Table listing and discovery maps to PyIceberg's namespace and table enumeration
2. Table loading produces Daft-compatible scan operators
3. Schema conversion handles Iceberg-to-Daft type mapping
4. Catalog metadata (name, properties) is preserved through the adapter
This pattern enables Daft to support multiple catalog backends (Iceberg, Unity, Gravitino) through a single Catalog interface, with each backend implemented as a separate adapter.