Implementation:Apache Paimon TableSchema

Knowledge Sources	Apache_Paimon
Domains	Schema Management, Table Metadata
Last Updated	2026-02-08 00:00 GMT

Overview

TableSchema represents the complete, versioned schema of a Paimon table, including fields, partition keys, primary keys, bucket configuration, options, and metadata.

Description

TableSchema is the central data model for table metadata in Apache Paimon. It extends beyond the user-facing Schema class by adding critical metadata such as a numeric schema ID, highest field ID tracking (essential for schema evolution), version information, and timestamp. The class validates schema constraints during construction, ensuring that primary keys are not identical to partition keys and that bucket keys are properly configured.

The schema supports multiple version constants (PAIMON_07_VERSION=1, PAIMON_08_VERSION=2, CURRENT_VERSION=3) for backward compatibility with older Paimon releases. Bucket keys are either explicitly configured via options or automatically derived from trimmed primary keys (primary keys excluding partition keys). The schema provides comprehensive projection utilities for extracting sub-schemas representing partitions, bucket keys, and primary keys.

TableSchema instances are immutable and thread-safe, serialized via JsonSerdeUtil for storage in catalog metadata. Every table operation in Paimon—reading, writing, compaction, schema evolution—depends on TableSchema to understand the table's structure, partitioning strategy, and key layout. The class includes helper methods for converting to/from the user-facing Schema class and for projecting specific field subsets.

Usage

Use TableSchema when working with table metadata at the storage layer, implementing catalog operations, or when full schema information including IDs and versioning is required. For schema definition and evolution, use the Schema class which is later converted to TableSchema.

Code Reference

Source Location

Repository: Apache_Paimon
File: paimon-api/src/main/java/org/apache/paimon/schema/TableSchema.java
Lines: 46-389

Signature

public class TableSchema implements Serializable {
    public static final int PAIMON_07_VERSION = 1;
    public static final int PAIMON_08_VERSION = 2;
    public static final int CURRENT_VERSION = 3;

    public TableSchema(
        long id,
        List<DataField> fields,
        int highestFieldId,
        List<String> partitionKeys,
        List<String> primaryKeys,
        Map<String, String> options,
        @Nullable String comment
    );

    public long id();
    public int version();
    public List<DataField> fields();
    public List<String> fieldNames();
    public int highestFieldId();
    public List<String> partitionKeys();
    public List<String> primaryKeys();
    public List<String> trimmedPrimaryKeys();
    public List<String> bucketKeys();
    public int numBuckets();
    public Map<String, String> options();
    public RowType logicalRowType();
    public TableSchema project(@Nullable List<String> writeCols);
}

Import

import org.apache.paimon.schema.TableSchema;

I/O Contract

Inputs

Name	Type	Required	Description
id	long	yes	Schema version identifier
fields	List<DataField>	yes	List of table fields with IDs and types
highestFieldId	int	yes	Highest field ID used (for evolution)
partitionKeys	List<String>	yes	Partition key field names
primaryKeys	List<String>	yes	Primary key field names
options	Map<String, String>	yes	Table configuration options
comment	String	no	Table comment/description

Outputs

Name	Type	Description
tableSchema	TableSchema	Complete table schema with validated metadata
rowType	RowType	Logical row type for data processing
projectedSchema	TableSchema	Schema projected to specific columns

Usage Examples

Creating a TableSchema

// Define fields with IDs
List<DataField> fields = Arrays.asList(
    new DataField(0, "user_id", DataTypes.BIGINT()),
    new DataField(1, "name", DataTypes.STRING()),
    new DataField(2, "dt", DataTypes.STRING())
);

// Create schema with partition and primary keys
TableSchema schema = new TableSchema(
    1L,                              // schema ID
    fields,                          // fields
    2,                               // highest field ID
    Arrays.asList("dt"),             // partition keys
    Arrays.asList("dt", "user_id"),  // primary keys
    Collections.singletonMap("bucket", "8"),
    "User dimension table"
);

Querying Schema Metadata

// Get schema properties
long schemaId = schema.id();
List<String> fieldNames = schema.fieldNames();
List<String> partitions = schema.partitionKeys();

// Get trimmed primary keys (excluding partition keys)
List<String> trimmedPks = schema.trimmedPrimaryKeys();
// Returns: ["user_id"] (excludes "dt")

// Get bucket configuration
List<String> bucketKeys = schema.bucketKeys();
int numBuckets = schema.numBuckets();

// Check for cross-partition updates
boolean crossPartition = schema.crossPartitionUpdate();

Working with Row Types

// Get complete row type
RowType fullType = schema.logicalRowType();

// Get partition key types
RowType partitionType = schema.logicalPartitionType();

// Get primary key types
RowType pkType = schema.logicalPrimaryKeysType();
RowType trimmedPkType = schema.logicalTrimmedPrimaryKeysType();

// Get bucket key types
RowType bucketKeyType = schema.logicalBucketKeyType();

Schema Projection

// Project schema to specific columns
TableSchema projected = schema.project(
    Arrays.asList("user_id", "name")
);

// Get projection indexes
int[] indexes = schema.projection(
    Arrays.asList("name", "user_id")
);
// Returns: [1, 0]

Schema Conversion

// Convert to user-facing Schema
Schema userSchema = tableSchema.toSchema();

// Create TableSchema from Schema
TableSchema newTableSchema = TableSchema.create(
    1L,        // schema ID
    userSchema // user schema
);

// Deserialize from JSON
String json = tableSchema.toString();
TableSchema restored = TableSchema.fromJson(json);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment