Implementation:Apache Paimon Catalog Create Database and Table
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Table_Format |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tools for creating databases and tables with schema definitions in Apache Paimon.
Description
The Catalog abstract class provides create_database() and create_table() methods for establishing databases and tables in the catalog. Schema objects can be constructed directly from DataField lists or converted from PyArrow schemas via Schema.from_pyarrow_schema(). The get_table() method retrieves a FileStoreTable reference for subsequent read and write operations. Schema conversion from PyArrow handles the mapping between PyArrow types and Paimon internal types, including support for nested types, timestamps, and decimal precision.
Usage
Use these tools after obtaining a catalog instance from CatalogFactory.create(). First create a database with create_database(), then define a Schema and create a table with create_table(). Finally, retrieve the table reference with get_table() to perform read or write operations.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/catalog/catalog.py
- Lines: L29-56
- File: paimon-python/pypaimon/schema/schema.py
- Lines: L28-88
Signature
class Catalog(ABC):
@abstractmethod
def create_database(self, name: str, ignore_if_exists: bool, properties: Optional[dict] = None):
pass
@abstractmethod
def create_table(self, identifier: Union[str, Identifier], schema: Schema, ignore_if_exists: bool):
pass
@abstractmethod
def get_table(self, identifier: Union[str, Identifier]) -> 'Table':
pass
class Schema:
def __init__(self, fields: Optional[List[DataField]] = None,
partition_keys: Optional[List[str]] = None,
primary_keys: Optional[List[str]] = None,
options: Optional[Dict] = None,
comment: Optional[str] = None):
@staticmethod
def from_pyarrow_schema(pa_schema: pa.Schema,
partition_keys: Optional[List[str]] = None,
primary_keys: Optional[List[str]] = None,
options: Optional[Dict] = None,
comment: Optional[str] = None):
Import
from pypaimon.catalog.catalog import Catalog
from pypaimon.schema.schema import Schema
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Database name for create_database()
|
| identifier | Union[str, Identifier] | Yes | Fully qualified table name (e.g., 'db.table') for create_table() and get_table()
|
| schema | Schema | Yes | Table schema with fields, partition_keys, primary_keys, and options for create_table()
|
| ignore_if_exists | bool | Yes | If True, skip creation when the database or table already exists
|
| properties | Optional[dict] | No | Additional database properties for create_database()
|
| pa_schema | pa.Schema | Yes (for from_pyarrow_schema) | PyArrow schema to convert to Paimon schema |
| partition_keys | Optional[List[str]] | No | Column names used for partitioning |
| primary_keys | Optional[List[str]] | No | Column names forming the primary key |
| options | Optional[Dict] | No | Table options such as {'bucket': '2'}
|
| comment | Optional[str] | No | Optional table comment |
Outputs
| Name | Type | Description |
|---|---|---|
| create_database return | None | Creates the database in the catalog (side effect) |
| create_table return | None | Creates the table in the catalog (side effect) |
| get_table return | Table | A FileStoreTable instance for subsequent read/write operations
|
| Schema.from_pyarrow_schema return | Schema | A Paimon Schema constructed from the given PyArrow schema
|
Usage Examples
Basic Usage
import pyarrow as pa
from pypaimon.schema.schema import Schema
# Define schema from PyArrow
pa_schema = pa.schema([
('id', pa.int64()),
('name', pa.string()),
('value', pa.float64()),
])
schema = Schema.from_pyarrow_schema(
pa_schema,
partition_keys=['name'],
primary_keys=['id'],
options={'bucket': '2'}
)
# Create database and table
catalog.create_database('my_db', ignore_if_exists=True)
catalog.create_table('my_db.my_table', schema, ignore_if_exists=True)
table = catalog.get_table('my_db.my_table')