Implementation:Apache Paimon TableSchema Python
| Knowledge Sources | |
|---|---|
| Domains | Schema Management, Metadata Storage |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
TableSchema represents the versioned schema definition of a Paimon table including fields, partition keys, primary keys, options, and metadata.
Description
The TableSchema class is a dataclass that encapsulates the complete schema definition for Apache Paimon tables. It includes schema versioning to track compatibility across Paimon versions (0.7, 0.8, and current version 3), field definitions with highest field ID tracking, partition and primary key specifications, table options, and optional comments.
The class provides methods for checking cross-partition update scenarios, converting to Schema objects, and serialization/deserialization from JSON format. It handles backward compatibility by applying default values for missing configuration options when reading older schema versions.
TableSchema supports reading from file paths using FileIO and includes timestamp tracking for schema modifications. The cross_partition_update method checks whether primary keys contain all partition keys to determine if updates can cross partition boundaries.
Usage
Use TableSchema when persisting table metadata, reading schema definitions from storage, implementing schema evolution logic, or managing table versioning in Apache Paimon tables.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/schema/table_schema.py
Signature
@dataclass
class TableSchema:
"""Versioned table schema with fields, keys, and options."""
PAIMON_07_VERSION = 1
PAIMON_08_VERSION = 2
CURRENT_VERSION = 3
version: int
id: int
fields: List[DataField]
highest_field_id: int
partition_keys: List[str]
primary_keys: List[str]
options: Dict[str, str]
comment: Optional[str]
time_millis: int
def cross_partition_update(self) -> bool:
"""Check if primary keys allow cross-partition updates."""
def to_schema(self) -> Schema:
"""Convert to Schema object."""
@staticmethod
def from_schema(schema_id: int, schema: Schema) -> "TableSchema":
"""Create TableSchema from Schema object."""
@staticmethod
def from_path(file_io: FileIO, schema_path: str):
"""Read TableSchema from file path."""
@staticmethod
def from_json(json_str: str):
"""Parse TableSchema from JSON string."""
def copy(self, new_options: Optional[Dict[str, str]] = None) -> "TableSchema":
"""Create a copy with optional new options."""
Import
from pypaimon.schema.table_schema import TableSchema
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| version | int | Yes | Schema version number |
| id | int | Yes | Schema ID |
| fields | List[DataField] | Yes | Field definitions |
| highest_field_id | int | Yes | Highest field ID used |
| partition_keys | List[str] | Yes | Partition column names |
| primary_keys | List[str] | Yes | Primary key column names |
| options | Dict[str, str] | Yes | Table configuration options |
| comment | str | No | Optional table comment |
| time_millis | int | Yes | Schema creation timestamp |
Outputs
| Name | Type | Description |
|---|---|---|
| table_schema | TableSchema | Complete table schema definition |
| schema | Schema | Converted Schema object |
| cross_partition | bool | Whether cross-partition updates are allowed |
Usage Examples
from pypaimon.schema.table_schema import TableSchema
from pypaimon.schema.data_types import DataField, AtomicType
# Read schema from file
table_schema = TableSchema.from_path(file_io, "schema/schema-1")
# Create schema from Schema object
from pypaimon.schema.schema import Schema
schema = Schema(
fields=[
DataField(0, "id", AtomicType("BIGINT")),
DataField(1, "name", AtomicType("STRING")),
DataField(2, "age", AtomicType("INT"))
],
partition_keys=["id"],
primary_keys=["id"],
options={"bucket": "4"}
)
table_schema = TableSchema.from_schema(schema_id=1, schema=schema)
# Check cross-partition update capability
can_update_across_partitions = table_schema.cross_partition_update()
# Convert to Schema
schema_obj = table_schema.to_schema()
# Create a copy with new options
new_schema = table_schema.copy(new_options={"bucket": "8"})
# Parse from JSON
json_str = '{"version": 3, "id": 1, "fields": [...], ...}'
parsed_schema = TableSchema.from_json(json_str)
print(f"Schema version: {table_schema.version}")
print(f"Fields: {[f.name for f in table_schema.fields]}")
print(f"Partition keys: {table_schema.partition_keys}")
print(f"Primary keys: {table_schema.primary_keys}")