Implementation:Apache Paimon CatalogFactory Create for Ray
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Distributed_Computing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for creating Paimon catalog instances in preparation for Ray distributed operations.
Description
Uses CatalogFactory.create() to establish a catalog connection, then catalog.get_table() to obtain a table reference. In the Ray context, this setup occurs on the driver node before distributing read/write tasks to workers. The same CatalogFactory.create() API is used, but the resulting table is used for Ray-specific operations (to_ray(), write_ray()).
Usage
Call CatalogFactory.create() with the appropriate catalog options dictionary, then call catalog.get_table() with the fully qualified table identifier (e.g., db.table). The resulting table reference is then used to create read builders or write builders for Ray operations.
Code Reference
Source Location
paimon-python/pypaimon/catalog/catalog_factory.py:L28-44
Signature
class CatalogFactory:
@staticmethod
def create(catalog_options: Dict) -> Catalog:
class Catalog(ABC):
@abstractmethod
def get_table(self, identifier: Union[str, Identifier]) -> 'Table':
Import
from pypaimon.catalog.catalog_factory import CatalogFactory
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| catalog_options | Dict | Yes | Configuration with 'metastore', 'uri', 'warehouse', etc. |
| identifier | Union[str, Identifier] | Yes | Table name (e.g., 'db.table') |
Outputs
| Name | Type | Description |
|---|---|---|
| catalog | Catalog | Catalog instance connected to the configured metastore |
| table | FileStoreTable | Table reference for read/write operations |
Usage Examples
Basic Usage
from pypaimon.catalog.catalog_factory import CatalogFactory
catalog_options = {
'metastore': 'rest',
'uri': 'http://localhost:8080',
'token': 'my-token',
}
catalog = CatalogFactory.create(catalog_options)
table = catalog.get_table('my_db.my_table')