Implementation:Apache Paimon KeySerializer
| Knowledge Sources | |
|---|---|
| Domains | Global Index, Serialization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
KeySerializer provides serialization, deserialization, and comparison operations for B-tree index keys with type-specific implementations.
Description
KeySerializer is an abstract interface defining the contract for handling keys in B-tree global indexes. It provides three core operations: serialize (converting objects to bytes), deserialize (converting bytes back to objects), and create_comparator (producing a comparison function for ordering keys). The interface has concrete implementations for different data types.
StringSerializer handles string keys by encoding them to UTF-8 bytes for storage and decoding back to strings for use. The comparator performs lexicographic comparison of string values. This is the most common key type for text-based indexes.
LongSerializer handles BIGINT (64-bit integer) keys using little-endian encoding with the struct module. Keys are packed as signed 64-bit integers ('<q' format) and compared numerically. This provides efficient storage and comparison for integer keys.
IntSerializer handles INT (32-bit integer) keys similarly, using little-endian signed 32-bit integer encoding ('<i' format). The comparison is also numerical, appropriate for smaller integer keys.
The create_serializer factory function maps Paimon DataTypes to appropriate serializers. Currently supported types are STRING/VARCHAR/CHAR (StringSerializer), BIGINT (LongSerializer), and INT (IntSerializer). Other types are not yet supported and will raise a ValueError.
Usage
KeySerializer is used internally by the global index system when creating or reading B-tree index files. The serializer is selected based on the key column's data type and used to convert between Python objects and the binary format stored in SST files.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/globalindex/btree/key_serializer.py
Signature
class KeySerializer(ABC):
"""
Interface for serializing and deserializing B-tree index keys.
"""
@abstractmethod
def serialize(self, key: object) -> bytes:
"""Serialize a key to bytes."""
pass
@abstractmethod
def deserialize(self, data: bytes) -> object:
"""Deserialize bytes to a key."""
pass
@abstractmethod
def create_comparator(self) -> Callable[[object, object], int]:
"""Create a comparator function for keys."""
pass
class StringSerializer(KeySerializer):
"""Serializer for STRING type."""
def serialize(self, key: object) -> bytes:
...
def deserialize(self, data: bytes) -> object:
...
def create_comparator(self) -> Callable[[object, object], int]:
...
class LongSerializer(KeySerializer):
"""Serializer for BIGINT type."""
...
class IntSerializer(KeySerializer):
"""Serializer for INT type."""
...
def create_serializer(data_type: DataType) -> KeySerializer:
"""Create appropriate serializer based on data type."""
...
Import
from pypaimon.globalindex.btree.key_serializer import KeySerializer, create_serializer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| key | object | Yes | Key value to serialize (type-specific) |
| data | bytes | Yes | Serialized key bytes to deserialize |
| data_type | DataType | Yes | Paimon DataType for serializer creation |
Outputs
| Name | Type | Description |
|---|---|---|
| serialized | bytes | Serialized key bytes |
| deserialized | object | Deserialized key value |
| comparator | Callable | Function comparing two keys, returns -1/0/1 |
Usage Examples
from pypaimon.globalindex.btree.key_serializer import create_serializer
from pypaimon.schema.data_types import StringType, BigIntType
# String key example
string_type = StringType()
string_serializer = create_serializer(string_type)
key1 = "alice"
key1_bytes = string_serializer.serialize(key1)
print(f"Serialized: {key1_bytes}") # b'alice'
key1_restored = string_serializer.deserialize(key1_bytes)
print(f"Deserialized: {key1_restored}") # 'alice'
# Compare keys
comparator = string_serializer.create_comparator()
result = comparator("alice", "bob")
print(f"Compare result: {result}") # -1 (alice < bob)
# Integer key example
bigint_type = BigIntType()
long_serializer = create_serializer(bigint_type)
key2 = 12345
key2_bytes = long_serializer.serialize(key2)
print(f"Serialized: {key2_bytes.hex()}") # Little-endian bytes
key2_restored = long_serializer.deserialize(key2_bytes)
print(f"Deserialized: {key2_restored}") # 12345
# Use in index operations
def write_index_entry(key, value, serializer):
key_bytes = serializer.serialize(key)
# Write to index file
write_to_block(key_bytes, value)
# Binary search with comparator
def search_index(target_key, entries, serializer):
comparator = serializer.create_comparator()
left, right = 0, len(entries) - 1
while left <= right:
mid = (left + right) // 2
mid_key = entries[mid].key
cmp = comparator(mid_key, target_key)
if cmp == 0:
return entries[mid]
elif cmp < 0:
left = mid + 1
else:
right = mid - 1
return None