Implementation:Apache Paimon Catalog Create Database and Table

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Table_Format
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tools for creating databases and tables with schema definitions in Apache Paimon.

Description

The Catalog abstract class provides create_database() and create_table() methods for establishing databases and tables in the catalog. Schema objects can be constructed directly from DataField lists or converted from PyArrow schemas via Schema.from_pyarrow_schema(). The get_table() method retrieves a FileStoreTable reference for subsequent read and write operations. Schema conversion from PyArrow handles the mapping between PyArrow types and Paimon internal types, including support for nested types, timestamps, and decimal precision.

Usage

Use these tools after obtaining a catalog instance from CatalogFactory.create(). First create a database with create_database(), then define a Schema and create a table with create_table(). Finally, retrieve the table reference with get_table() to perform read or write operations.

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/catalog/catalog.py
Lines: L29-56
File: paimon-python/pypaimon/schema/schema.py
Lines: L28-88

Signature

class Catalog(ABC):
    @abstractmethod
    def create_database(self, name: str, ignore_if_exists: bool, properties: Optional[dict] = None):
        pass

    @abstractmethod
    def create_table(self, identifier: Union[str, Identifier], schema: Schema, ignore_if_exists: bool):
        pass

    @abstractmethod
    def get_table(self, identifier: Union[str, Identifier]) -> 'Table':
        pass

class Schema:
    def __init__(self, fields: Optional[List[DataField]] = None,
                 partition_keys: Optional[List[str]] = None,
                 primary_keys: Optional[List[str]] = None,
                 options: Optional[Dict] = None,
                 comment: Optional[str] = None):

    @staticmethod
    def from_pyarrow_schema(pa_schema: pa.Schema,
                            partition_keys: Optional[List[str]] = None,
                            primary_keys: Optional[List[str]] = None,
                            options: Optional[Dict] = None,
                            comment: Optional[str] = None):

Import

from pypaimon.catalog.catalog import Catalog
from pypaimon.schema.schema import Schema

I/O Contract

Inputs

Name	Type	Required	Description
name	str	Yes	Database name for `create_database()`
identifier	Union[str, Identifier]	Yes	Fully qualified table name (e.g., `'db.table'`) for `create_table()` and `get_table()`
schema	Schema	Yes	Table schema with fields, partition_keys, primary_keys, and options for `create_table()`
ignore_if_exists	bool	Yes	If `True`, skip creation when the database or table already exists
properties	Optional[dict]	No	Additional database properties for `create_database()`
pa_schema	pa.Schema	Yes (for from_pyarrow_schema)	PyArrow schema to convert to Paimon schema
partition_keys	Optional[List[str]]	No	Column names used for partitioning
primary_keys	Optional[List[str]]	No	Column names forming the primary key
options	Optional[Dict]	No	Table options such as `{'bucket': '2'}`
comment	Optional[str]	No	Optional table comment

Outputs

Name	Type	Description
create_database return	None	Creates the database in the catalog (side effect)
create_table return	None	Creates the table in the catalog (side effect)
get_table return	Table	A `FileStoreTable` instance for subsequent read/write operations
Schema.from_pyarrow_schema return	Schema	A Paimon `Schema` constructed from the given PyArrow schema

Usage Examples

Basic Usage

import pyarrow as pa
from pypaimon.schema.schema import Schema

# Define schema from PyArrow
pa_schema = pa.schema([
    ('id', pa.int64()),
    ('name', pa.string()),
    ('value', pa.float64()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    partition_keys=['name'],
    primary_keys=['id'],
    options={'bucket': '2'}
)

# Create database and table
catalog.create_database('my_db', ignore_if_exists=True)
catalog.create_table('my_db.my_table', schema, ignore_if_exists=True)
table = catalog.get_table('my_db.my_table')

Related Pages

Implements Principle

Principle:Apache_Paimon_Schema_Definition_and_Table_Creation

Requires Environment

Environment:Apache_Paimon_Python_Core_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment