Implementation:Apache Paimon TableSchema Python

Knowledge Sources	Apache_Paimon
Domains	Schema Management, Metadata Storage
Last Updated	2026-02-08 00:00 GMT

Overview

TableSchema represents the versioned schema definition of a Paimon table including fields, partition keys, primary keys, options, and metadata.

Description

The TableSchema class is a dataclass that encapsulates the complete schema definition for Apache Paimon tables. It includes schema versioning to track compatibility across Paimon versions (0.7, 0.8, and current version 3), field definitions with highest field ID tracking, partition and primary key specifications, table options, and optional comments.

The class provides methods for checking cross-partition update scenarios, converting to Schema objects, and serialization/deserialization from JSON format. It handles backward compatibility by applying default values for missing configuration options when reading older schema versions.

TableSchema supports reading from file paths using FileIO and includes timestamp tracking for schema modifications. The cross_partition_update method checks whether primary keys contain all partition keys to determine if updates can cross partition boundaries.

Usage

Use TableSchema when persisting table metadata, reading schema definitions from storage, implementing schema evolution logic, or managing table versioning in Apache Paimon tables.

Code Reference

Source Location

Repository: Apache_Paimon
File: paimon-python/pypaimon/schema/table_schema.py

Signature

@dataclass
class TableSchema:
    """Versioned table schema with fields, keys, and options."""

    PAIMON_07_VERSION = 1
    PAIMON_08_VERSION = 2
    CURRENT_VERSION = 3

    version: int
    id: int
    fields: List[DataField]
    highest_field_id: int
    partition_keys: List[str]
    primary_keys: List[str]
    options: Dict[str, str]
    comment: Optional[str]
    time_millis: int

    def cross_partition_update(self) -> bool:
        """Check if primary keys allow cross-partition updates."""

    def to_schema(self) -> Schema:
        """Convert to Schema object."""

    @staticmethod
    def from_schema(schema_id: int, schema: Schema) -> "TableSchema":
        """Create TableSchema from Schema object."""

    @staticmethod
    def from_path(file_io: FileIO, schema_path: str):
        """Read TableSchema from file path."""

    @staticmethod
    def from_json(json_str: str):
        """Parse TableSchema from JSON string."""

    def copy(self, new_options: Optional[Dict[str, str]] = None) -> "TableSchema":
        """Create a copy with optional new options."""

Import

from pypaimon.schema.table_schema import TableSchema

I/O Contract

Inputs

Name	Type	Required	Description
version	int	Yes	Schema version number
id	int	Yes	Schema ID
fields	List[DataField]	Yes	Field definitions
highest_field_id	int	Yes	Highest field ID used
partition_keys	List[str]	Yes	Partition column names
primary_keys	List[str]	Yes	Primary key column names
options	Dict[str, str]	Yes	Table configuration options
comment	str	No	Optional table comment
time_millis	int	Yes	Schema creation timestamp

Outputs

Name	Type	Description
table_schema	TableSchema	Complete table schema definition
schema	Schema	Converted Schema object
cross_partition	bool	Whether cross-partition updates are allowed

Usage Examples

from pypaimon.schema.table_schema import TableSchema
from pypaimon.schema.data_types import DataField, AtomicType

# Read schema from file
table_schema = TableSchema.from_path(file_io, "schema/schema-1")

# Create schema from Schema object
from pypaimon.schema.schema import Schema
schema = Schema(
    fields=[
        DataField(0, "id", AtomicType("BIGINT")),
        DataField(1, "name", AtomicType("STRING")),
        DataField(2, "age", AtomicType("INT"))
    ],
    partition_keys=["id"],
    primary_keys=["id"],
    options={"bucket": "4"}
)
table_schema = TableSchema.from_schema(schema_id=1, schema=schema)

# Check cross-partition update capability
can_update_across_partitions = table_schema.cross_partition_update()

# Convert to Schema
schema_obj = table_schema.to_schema()

# Create a copy with new options
new_schema = table_schema.copy(new_options={"bucket": "8"})

# Parse from JSON
json_str = '{"version": 3, "id": 1, "fields": [...], ...}'
parsed_schema = TableSchema.from_json(json_str)

print(f"Schema version: {table_schema.version}")
print(f"Fields: {[f.name for f in table_schema.fields]}")
print(f"Partition keys: {table_schema.partition_keys}")
print(f"Primary keys: {table_schema.primary_keys}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment