Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon TableSchema

From Leeroopedia


Knowledge Sources
Domains Schema Management, Table Metadata
Last Updated 2026-02-08 00:00 GMT

Overview

TableSchema represents the complete, versioned schema of a Paimon table, including fields, partition keys, primary keys, bucket configuration, options, and metadata.

Description

TableSchema is the central data model for table metadata in Apache Paimon. It extends beyond the user-facing Schema class by adding critical metadata such as a numeric schema ID, highest field ID tracking (essential for schema evolution), version information, and timestamp. The class validates schema constraints during construction, ensuring that primary keys are not identical to partition keys and that bucket keys are properly configured.

The schema supports multiple version constants (PAIMON_07_VERSION=1, PAIMON_08_VERSION=2, CURRENT_VERSION=3) for backward compatibility with older Paimon releases. Bucket keys are either explicitly configured via options or automatically derived from trimmed primary keys (primary keys excluding partition keys). The schema provides comprehensive projection utilities for extracting sub-schemas representing partitions, bucket keys, and primary keys.

TableSchema instances are immutable and thread-safe, serialized via JsonSerdeUtil for storage in catalog metadata. Every table operation in Paimon—reading, writing, compaction, schema evolution—depends on TableSchema to understand the table's structure, partitioning strategy, and key layout. The class includes helper methods for converting to/from the user-facing Schema class and for projecting specific field subsets.

Usage

Use TableSchema when working with table metadata at the storage layer, implementing catalog operations, or when full schema information including IDs and versioning is required. For schema definition and evolution, use the Schema class which is later converted to TableSchema.

Code Reference

Source Location

Signature

public class TableSchema implements Serializable {
    public static final int PAIMON_07_VERSION = 1;
    public static final int PAIMON_08_VERSION = 2;
    public static final int CURRENT_VERSION = 3;

    public TableSchema(
        long id,
        List<DataField> fields,
        int highestFieldId,
        List<String> partitionKeys,
        List<String> primaryKeys,
        Map<String, String> options,
        @Nullable String comment
    );

    public long id();
    public int version();
    public List<DataField> fields();
    public List<String> fieldNames();
    public int highestFieldId();
    public List<String> partitionKeys();
    public List<String> primaryKeys();
    public List<String> trimmedPrimaryKeys();
    public List<String> bucketKeys();
    public int numBuckets();
    public Map<String, String> options();
    public RowType logicalRowType();
    public TableSchema project(@Nullable List<String> writeCols);
}

Import

import org.apache.paimon.schema.TableSchema;

I/O Contract

Inputs

Name Type Required Description
id long yes Schema version identifier
fields List<DataField> yes List of table fields with IDs and types
highestFieldId int yes Highest field ID used (for evolution)
partitionKeys List<String> yes Partition key field names
primaryKeys List<String> yes Primary key field names
options Map<String, String> yes Table configuration options
comment String no Table comment/description

Outputs

Name Type Description
tableSchema TableSchema Complete table schema with validated metadata
rowType RowType Logical row type for data processing
projectedSchema TableSchema Schema projected to specific columns

Usage Examples

Creating a TableSchema

// Define fields with IDs
List<DataField> fields = Arrays.asList(
    new DataField(0, "user_id", DataTypes.BIGINT()),
    new DataField(1, "name", DataTypes.STRING()),
    new DataField(2, "dt", DataTypes.STRING())
);

// Create schema with partition and primary keys
TableSchema schema = new TableSchema(
    1L,                              // schema ID
    fields,                          // fields
    2,                               // highest field ID
    Arrays.asList("dt"),             // partition keys
    Arrays.asList("dt", "user_id"),  // primary keys
    Collections.singletonMap("bucket", "8"),
    "User dimension table"
);

Querying Schema Metadata

// Get schema properties
long schemaId = schema.id();
List<String> fieldNames = schema.fieldNames();
List<String> partitions = schema.partitionKeys();

// Get trimmed primary keys (excluding partition keys)
List<String> trimmedPks = schema.trimmedPrimaryKeys();
// Returns: ["user_id"] (excludes "dt")

// Get bucket configuration
List<String> bucketKeys = schema.bucketKeys();
int numBuckets = schema.numBuckets();

// Check for cross-partition updates
boolean crossPartition = schema.crossPartitionUpdate();

Working with Row Types

// Get complete row type
RowType fullType = schema.logicalRowType();

// Get partition key types
RowType partitionType = schema.logicalPartitionType();

// Get primary key types
RowType pkType = schema.logicalPrimaryKeysType();
RowType trimmedPkType = schema.logicalTrimmedPrimaryKeysType();

// Get bucket key types
RowType bucketKeyType = schema.logicalBucketKeyType();

Schema Projection

// Project schema to specific columns
TableSchema projected = schema.project(
    Arrays.asList("user_id", "name")
);

// Get projection indexes
int[] indexes = schema.projection(
    Arrays.asList("name", "user_id")
);
// Returns: [1, 0]

Schema Conversion

// Convert to user-facing Schema
Schema userSchema = tableSchema.toSchema();

// Create TableSchema from Schema
TableSchema newTableSchema = TableSchema.create(
    1L,        // schema ID
    userSchema // user schema
);

// Deserialize from JSON
String json = tableSchema.toString();
TableSchema restored = TableSchema.fromJson(json);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment