Implementation:Apache Paimon Schema From Pyarrow

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Data_Ingestion
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for creating Paimon Schema objects from PyArrow schemas for table creation.

Description

Schema.from_pyarrow_schema() converts a PyArrow schema to a Paimon Schema with additional metadata (partition keys, primary keys, table options). It validates blob type constraints and required options. This method is used in Ray ingestion pipelines to ensure the target table schema matches the expected data format.

The conversion process:

Iterates over each field in the PyArrow schema.
Converts each PyArrow type to its corresponding Paimon DataField representation.
Attaches metadata such as partition keys, primary keys, and table options.
Validates constraints (e.g., blob types require specific configurations).
Returns a fully constructed Schema instance ready for catalog.create_table().

Usage

Use Schema.from_pyarrow_schema() when creating a new Paimon table whose schema is defined using PyArrow types. This is the standard approach in Ray-based ingestion pipelines where the data source schema is known in PyArrow format.

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/schema/schema.py:L28-88

Signature

@dataclass
class Schema:
    def __init__(self, fields: Optional[List[DataField]] = None,
                 partition_keys: Optional[List[str]] = None,
                 primary_keys: Optional[List[str]] = None,
                 options: Optional[Dict] = None,
                 comment: Optional[str] = None):

    @staticmethod
    def from_pyarrow_schema(pa_schema: pa.Schema,
                            partition_keys: Optional[List[str]] = None,
                            primary_keys: Optional[List[str]] = None,
                            options: Optional[Dict] = None,
                            comment: Optional[str] = None) -> Schema:

Import

from pypaimon.schema.schema import Schema

I/O Contract

Inputs

Name	Type	Required	Description
pa_schema	pa.Schema	Yes	PyArrow schema defining the column names and types for the Paimon table.
partition_keys	Optional[List[str]]	No	List of column names to use as partition keys. Controls physical data layout for partition pruning.
primary_keys	Optional[List[str]]	No	List of column names to use as primary keys. Enables upsert semantics and deduplication.
options	Optional[Dict]	No	Table options as a dictionary of string key-value pairs (e.g., {'bucket': '4'}). Controls storage behavior.
comment	Optional[str]	No	Optional human-readable description of the table.

Outputs

Name	Type	Description
schema	Schema	A Paimon Schema instance containing the converted fields, partition keys, primary keys, and options. Ready for use with catalog.create_table().

Usage Examples

Basic Usage

import pyarrow as pa
from pypaimon.schema.schema import Schema

pa_schema = pa.schema([
    ('user_id', pa.int64()),
    ('event', pa.string()),
    ('timestamp', pa.int64()),
    ('value', pa.float64()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    primary_keys=['user_id'],
    options={'bucket': '4'}
)

catalog.create_table('analytics.events', schema, ignore_if_exists=True)
table = catalog.get_table('analytics.events')

With Partition Keys

import pyarrow as pa
from pypaimon.schema.schema import Schema

pa_schema = pa.schema([
    ('order_id', pa.int64()),
    ('customer_id', pa.int64()),
    ('order_date', pa.string()),
    ('amount', pa.float64()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    partition_keys=['order_date'],
    primary_keys=['order_id'],
    options={'bucket': '2'},
    comment='Daily order data partitioned by date'
)

catalog.create_table('sales.orders', schema, ignore_if_exists=True)

Related Pages

Implements Principle

Principle:Apache_Paimon_Target_Table_Preparation

Requires Environment

Environment:Apache_Paimon_Python_Core_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment