Implementation:Apache Paimon TableSchema
| Knowledge Sources | |
|---|---|
| Domains | Schema Management, Table Metadata |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
TableSchema represents the complete, versioned schema of a Paimon table, including fields, partition keys, primary keys, bucket configuration, options, and metadata.
Description
TableSchema is the central data model for table metadata in Apache Paimon. It extends beyond the user-facing Schema class by adding critical metadata such as a numeric schema ID, highest field ID tracking (essential for schema evolution), version information, and timestamp. The class validates schema constraints during construction, ensuring that primary keys are not identical to partition keys and that bucket keys are properly configured.
The schema supports multiple version constants (PAIMON_07_VERSION=1, PAIMON_08_VERSION=2, CURRENT_VERSION=3) for backward compatibility with older Paimon releases. Bucket keys are either explicitly configured via options or automatically derived from trimmed primary keys (primary keys excluding partition keys). The schema provides comprehensive projection utilities for extracting sub-schemas representing partitions, bucket keys, and primary keys.
TableSchema instances are immutable and thread-safe, serialized via JsonSerdeUtil for storage in catalog metadata. Every table operation in Paimon—reading, writing, compaction, schema evolution—depends on TableSchema to understand the table's structure, partitioning strategy, and key layout. The class includes helper methods for converting to/from the user-facing Schema class and for projecting specific field subsets.
Usage
Use TableSchema when working with table metadata at the storage layer, implementing catalog operations, or when full schema information including IDs and versioning is required. For schema definition and evolution, use the Schema class which is later converted to TableSchema.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-api/src/main/java/org/apache/paimon/schema/TableSchema.java
- Lines: 46-389
Signature
public class TableSchema implements Serializable {
public static final int PAIMON_07_VERSION = 1;
public static final int PAIMON_08_VERSION = 2;
public static final int CURRENT_VERSION = 3;
public TableSchema(
long id,
List<DataField> fields,
int highestFieldId,
List<String> partitionKeys,
List<String> primaryKeys,
Map<String, String> options,
@Nullable String comment
);
public long id();
public int version();
public List<DataField> fields();
public List<String> fieldNames();
public int highestFieldId();
public List<String> partitionKeys();
public List<String> primaryKeys();
public List<String> trimmedPrimaryKeys();
public List<String> bucketKeys();
public int numBuckets();
public Map<String, String> options();
public RowType logicalRowType();
public TableSchema project(@Nullable List<String> writeCols);
}
Import
import org.apache.paimon.schema.TableSchema;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| id | long | yes | Schema version identifier |
| fields | List<DataField> | yes | List of table fields with IDs and types |
| highestFieldId | int | yes | Highest field ID used (for evolution) |
| partitionKeys | List<String> | yes | Partition key field names |
| primaryKeys | List<String> | yes | Primary key field names |
| options | Map<String, String> | yes | Table configuration options |
| comment | String | no | Table comment/description |
Outputs
| Name | Type | Description |
|---|---|---|
| tableSchema | TableSchema | Complete table schema with validated metadata |
| rowType | RowType | Logical row type for data processing |
| projectedSchema | TableSchema | Schema projected to specific columns |
Usage Examples
Creating a TableSchema
// Define fields with IDs
List<DataField> fields = Arrays.asList(
new DataField(0, "user_id", DataTypes.BIGINT()),
new DataField(1, "name", DataTypes.STRING()),
new DataField(2, "dt", DataTypes.STRING())
);
// Create schema with partition and primary keys
TableSchema schema = new TableSchema(
1L, // schema ID
fields, // fields
2, // highest field ID
Arrays.asList("dt"), // partition keys
Arrays.asList("dt", "user_id"), // primary keys
Collections.singletonMap("bucket", "8"),
"User dimension table"
);
Querying Schema Metadata
// Get schema properties
long schemaId = schema.id();
List<String> fieldNames = schema.fieldNames();
List<String> partitions = schema.partitionKeys();
// Get trimmed primary keys (excluding partition keys)
List<String> trimmedPks = schema.trimmedPrimaryKeys();
// Returns: ["user_id"] (excludes "dt")
// Get bucket configuration
List<String> bucketKeys = schema.bucketKeys();
int numBuckets = schema.numBuckets();
// Check for cross-partition updates
boolean crossPartition = schema.crossPartitionUpdate();
Working with Row Types
// Get complete row type
RowType fullType = schema.logicalRowType();
// Get partition key types
RowType partitionType = schema.logicalPartitionType();
// Get primary key types
RowType pkType = schema.logicalPrimaryKeysType();
RowType trimmedPkType = schema.logicalTrimmedPrimaryKeysType();
// Get bucket key types
RowType bucketKeyType = schema.logicalBucketKeyType();
Schema Projection
// Project schema to specific columns
TableSchema projected = schema.project(
Arrays.asList("user_id", "name")
);
// Get projection indexes
int[] indexes = schema.projection(
Arrays.asList("name", "user_id")
);
// Returns: [1, 0]
Schema Conversion
// Convert to user-facing Schema
Schema userSchema = tableSchema.toSchema();
// Create TableSchema from Schema
TableSchema newTableSchema = TableSchema.create(
1L, // schema ID
userSchema // user schema
);
// Deserialize from JSON
String json = tableSchema.toString();
TableSchema restored = TableSchema.fromJson(json);