Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon TableSchema Python

From Leeroopedia


Knowledge Sources
Domains Schema Management, Metadata Storage
Last Updated 2026-02-08 00:00 GMT

Overview

TableSchema represents the versioned schema definition of a Paimon table including fields, partition keys, primary keys, options, and metadata.

Description

The TableSchema class is a dataclass that encapsulates the complete schema definition for Apache Paimon tables. It includes schema versioning to track compatibility across Paimon versions (0.7, 0.8, and current version 3), field definitions with highest field ID tracking, partition and primary key specifications, table options, and optional comments.

The class provides methods for checking cross-partition update scenarios, converting to Schema objects, and serialization/deserialization from JSON format. It handles backward compatibility by applying default values for missing configuration options when reading older schema versions.

TableSchema supports reading from file paths using FileIO and includes timestamp tracking for schema modifications. The cross_partition_update method checks whether primary keys contain all partition keys to determine if updates can cross partition boundaries.

Usage

Use TableSchema when persisting table metadata, reading schema definitions from storage, implementing schema evolution logic, or managing table versioning in Apache Paimon tables.

Code Reference

Source Location

Signature

@dataclass
class TableSchema:
    """Versioned table schema with fields, keys, and options."""

    PAIMON_07_VERSION = 1
    PAIMON_08_VERSION = 2
    CURRENT_VERSION = 3

    version: int
    id: int
    fields: List[DataField]
    highest_field_id: int
    partition_keys: List[str]
    primary_keys: List[str]
    options: Dict[str, str]
    comment: Optional[str]
    time_millis: int

    def cross_partition_update(self) -> bool:
        """Check if primary keys allow cross-partition updates."""

    def to_schema(self) -> Schema:
        """Convert to Schema object."""

    @staticmethod
    def from_schema(schema_id: int, schema: Schema) -> "TableSchema":
        """Create TableSchema from Schema object."""

    @staticmethod
    def from_path(file_io: FileIO, schema_path: str):
        """Read TableSchema from file path."""

    @staticmethod
    def from_json(json_str: str):
        """Parse TableSchema from JSON string."""

    def copy(self, new_options: Optional[Dict[str, str]] = None) -> "TableSchema":
        """Create a copy with optional new options."""

Import

from pypaimon.schema.table_schema import TableSchema

I/O Contract

Inputs

Name Type Required Description
version int Yes Schema version number
id int Yes Schema ID
fields List[DataField] Yes Field definitions
highest_field_id int Yes Highest field ID used
partition_keys List[str] Yes Partition column names
primary_keys List[str] Yes Primary key column names
options Dict[str, str] Yes Table configuration options
comment str No Optional table comment
time_millis int Yes Schema creation timestamp

Outputs

Name Type Description
table_schema TableSchema Complete table schema definition
schema Schema Converted Schema object
cross_partition bool Whether cross-partition updates are allowed

Usage Examples

from pypaimon.schema.table_schema import TableSchema
from pypaimon.schema.data_types import DataField, AtomicType

# Read schema from file
table_schema = TableSchema.from_path(file_io, "schema/schema-1")

# Create schema from Schema object
from pypaimon.schema.schema import Schema
schema = Schema(
    fields=[
        DataField(0, "id", AtomicType("BIGINT")),
        DataField(1, "name", AtomicType("STRING")),
        DataField(2, "age", AtomicType("INT"))
    ],
    partition_keys=["id"],
    primary_keys=["id"],
    options={"bucket": "4"}
)
table_schema = TableSchema.from_schema(schema_id=1, schema=schema)

# Check cross-partition update capability
can_update_across_partitions = table_schema.cross_partition_update()

# Convert to Schema
schema_obj = table_schema.to_schema()

# Create a copy with new options
new_schema = table_schema.copy(new_options={"bucket": "8"})

# Parse from JSON
json_str = '{"version": 3, "id": 1, "fields": [...], ...}'
parsed_schema = TableSchema.from_json(json_str)

print(f"Schema version: {table_schema.version}")
print(f"Fields: {[f.name for f in table_schema.fields]}")
print(f"Partition keys: {table_schema.partition_keys}")
print(f"Primary keys: {table_schema.primary_keys}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment