Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon Catalog Create Database and Table

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Table_Format
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tools for creating databases and tables with schema definitions in Apache Paimon.

Description

The Catalog abstract class provides create_database() and create_table() methods for establishing databases and tables in the catalog. Schema objects can be constructed directly from DataField lists or converted from PyArrow schemas via Schema.from_pyarrow_schema(). The get_table() method retrieves a FileStoreTable reference for subsequent read and write operations. Schema conversion from PyArrow handles the mapping between PyArrow types and Paimon internal types, including support for nested types, timestamps, and decimal precision.

Usage

Use these tools after obtaining a catalog instance from CatalogFactory.create(). First create a database with create_database(), then define a Schema and create a table with create_table(). Finally, retrieve the table reference with get_table() to perform read or write operations.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/catalog/catalog.py
  • Lines: L29-56
  • File: paimon-python/pypaimon/schema/schema.py
  • Lines: L28-88

Signature

class Catalog(ABC):
    @abstractmethod
    def create_database(self, name: str, ignore_if_exists: bool, properties: Optional[dict] = None):
        pass

    @abstractmethod
    def create_table(self, identifier: Union[str, Identifier], schema: Schema, ignore_if_exists: bool):
        pass

    @abstractmethod
    def get_table(self, identifier: Union[str, Identifier]) -> 'Table':
        pass

class Schema:
    def __init__(self, fields: Optional[List[DataField]] = None,
                 partition_keys: Optional[List[str]] = None,
                 primary_keys: Optional[List[str]] = None,
                 options: Optional[Dict] = None,
                 comment: Optional[str] = None):

    @staticmethod
    def from_pyarrow_schema(pa_schema: pa.Schema,
                            partition_keys: Optional[List[str]] = None,
                            primary_keys: Optional[List[str]] = None,
                            options: Optional[Dict] = None,
                            comment: Optional[str] = None):

Import

from pypaimon.catalog.catalog import Catalog
from pypaimon.schema.schema import Schema

I/O Contract

Inputs

Name Type Required Description
name str Yes Database name for create_database()
identifier Union[str, Identifier] Yes Fully qualified table name (e.g., 'db.table') for create_table() and get_table()
schema Schema Yes Table schema with fields, partition_keys, primary_keys, and options for create_table()
ignore_if_exists bool Yes If True, skip creation when the database or table already exists
properties Optional[dict] No Additional database properties for create_database()
pa_schema pa.Schema Yes (for from_pyarrow_schema) PyArrow schema to convert to Paimon schema
partition_keys Optional[List[str]] No Column names used for partitioning
primary_keys Optional[List[str]] No Column names forming the primary key
options Optional[Dict] No Table options such as {'bucket': '2'}
comment Optional[str] No Optional table comment

Outputs

Name Type Description
create_database return None Creates the database in the catalog (side effect)
create_table return None Creates the table in the catalog (side effect)
get_table return Table A FileStoreTable instance for subsequent read/write operations
Schema.from_pyarrow_schema return Schema A Paimon Schema constructed from the given PyArrow schema

Usage Examples

Basic Usage

import pyarrow as pa
from pypaimon.schema.schema import Schema

# Define schema from PyArrow
pa_schema = pa.schema([
    ('id', pa.int64()),
    ('name', pa.string()),
    ('value', pa.float64()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    partition_keys=['name'],
    primary_keys=['id'],
    options={'bucket': '2'}
)

# Create database and table
catalog.create_database('my_db', ignore_if_exists=True)
catalog.create_table('my_db.my_table', schema, ignore_if_exists=True)
table = catalog.get_table('my_db.my_table')

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment