Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Eventual Inc Daft Catalog Registration

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Catalog
Last Updated 2026-02-08 00:00 GMT

Overview

Catalog registration is the technique for connecting external data catalogs to a Daft session, enabling unified data access across multiple catalog systems.

Description

Catalog registration attaches external catalog systems (such as Apache Iceberg, Unity Catalog, and Gravitino) to a Daft session, enabling SQL queries and DataFrame operations against cataloged tables. Once a catalog is attached to a session, its tables can be referenced by name through the session's unified namespace. The session maintains a registry of all attached catalogs and resolves table references by searching through them. This allows users to work with tables from different catalog backends using a single, consistent API without needing to manage individual catalog connections separately.

Usage

Use catalog registration when you need to connect to one or more external data catalogs (e.g., Iceberg, Unity, Gravitino) and query their tables through Daft's session interface. This is the entry point for any workflow that involves discovering and accessing tables managed by external metadata stores.

Theoretical Basis

Catalog registration follows the catalog federation pattern, where multiple external metadata stores are unified under a single session namespace. This pattern is common in data lake architectures where data is spread across multiple catalog systems.

The general workflow is:

1. Create or obtain a catalog instance (e.g., Iceberg catalog, Unity catalog)
2. Attach the catalog to a Daft session with an optional alias
3. The session wraps non-Daft catalogs into a unified Catalog interface
4. Tables from the attached catalog become accessible by name
5. SQL queries and DataFrame operations resolve table names through the session's catalog registry

This design decouples the data access layer from the specific catalog implementation, allowing Daft to support new catalog backends without changing the query interface.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment