Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon KeySerializer

From Leeroopedia


Knowledge Sources
Domains Global Index, Serialization
Last Updated 2026-02-08 00:00 GMT

Overview

KeySerializer provides serialization, deserialization, and comparison operations for B-tree index keys with type-specific implementations.

Description

KeySerializer is an abstract interface defining the contract for handling keys in B-tree global indexes. It provides three core operations: serialize (converting objects to bytes), deserialize (converting bytes back to objects), and create_comparator (producing a comparison function for ordering keys). The interface has concrete implementations for different data types.

StringSerializer handles string keys by encoding them to UTF-8 bytes for storage and decoding back to strings for use. The comparator performs lexicographic comparison of string values. This is the most common key type for text-based indexes.

LongSerializer handles BIGINT (64-bit integer) keys using little-endian encoding with the struct module. Keys are packed as signed 64-bit integers ('<q' format) and compared numerically. This provides efficient storage and comparison for integer keys.

IntSerializer handles INT (32-bit integer) keys similarly, using little-endian signed 32-bit integer encoding ('<i' format). The comparison is also numerical, appropriate for smaller integer keys.

The create_serializer factory function maps Paimon DataTypes to appropriate serializers. Currently supported types are STRING/VARCHAR/CHAR (StringSerializer), BIGINT (LongSerializer), and INT (IntSerializer). Other types are not yet supported and will raise a ValueError.

Usage

KeySerializer is used internally by the global index system when creating or reading B-tree index files. The serializer is selected based on the key column's data type and used to convert between Python objects and the binary format stored in SST files.

Code Reference

Source Location

Signature

class KeySerializer(ABC):
    """
    Interface for serializing and deserializing B-tree index keys.
    """

    @abstractmethod
    def serialize(self, key: object) -> bytes:
        """Serialize a key to bytes."""
        pass

    @abstractmethod
    def deserialize(self, data: bytes) -> object:
        """Deserialize bytes to a key."""
        pass

    @abstractmethod
    def create_comparator(self) -> Callable[[object, object], int]:
        """Create a comparator function for keys."""
        pass


class StringSerializer(KeySerializer):
    """Serializer for STRING type."""

    def serialize(self, key: object) -> bytes:
        ...

    def deserialize(self, data: bytes) -> object:
        ...

    def create_comparator(self) -> Callable[[object, object], int]:
        ...


class LongSerializer(KeySerializer):
    """Serializer for BIGINT type."""
    ...


class IntSerializer(KeySerializer):
    """Serializer for INT type."""
    ...


def create_serializer(data_type: DataType) -> KeySerializer:
    """Create appropriate serializer based on data type."""
    ...

Import

from pypaimon.globalindex.btree.key_serializer import KeySerializer, create_serializer

I/O Contract

Inputs

Name Type Required Description
key object Yes Key value to serialize (type-specific)
data bytes Yes Serialized key bytes to deserialize
data_type DataType Yes Paimon DataType for serializer creation

Outputs

Name Type Description
serialized bytes Serialized key bytes
deserialized object Deserialized key value
comparator Callable Function comparing two keys, returns -1/0/1

Usage Examples

from pypaimon.globalindex.btree.key_serializer import create_serializer
from pypaimon.schema.data_types import StringType, BigIntType

# String key example
string_type = StringType()
string_serializer = create_serializer(string_type)

key1 = "alice"
key1_bytes = string_serializer.serialize(key1)
print(f"Serialized: {key1_bytes}")  # b'alice'

key1_restored = string_serializer.deserialize(key1_bytes)
print(f"Deserialized: {key1_restored}")  # 'alice'

# Compare keys
comparator = string_serializer.create_comparator()
result = comparator("alice", "bob")
print(f"Compare result: {result}")  # -1 (alice < bob)

# Integer key example
bigint_type = BigIntType()
long_serializer = create_serializer(bigint_type)

key2 = 12345
key2_bytes = long_serializer.serialize(key2)
print(f"Serialized: {key2_bytes.hex()}")  # Little-endian bytes

key2_restored = long_serializer.deserialize(key2_bytes)
print(f"Deserialized: {key2_restored}")  # 12345

# Use in index operations
def write_index_entry(key, value, serializer):
    key_bytes = serializer.serialize(key)
    # Write to index file
    write_to_block(key_bytes, value)

# Binary search with comparator
def search_index(target_key, entries, serializer):
    comparator = serializer.create_comparator()
    left, right = 0, len(entries) - 1

    while left <= right:
        mid = (left + right) // 2
        mid_key = entries[mid].key
        cmp = comparator(mid_key, target_key)

        if cmp == 0:
            return entries[mid]
        elif cmp < 0:
            left = mid + 1
        else:
            right = mid - 1

    return None

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment