Implementation:Apache Paimon BlockReader

Knowledge Sources	Apache_Paimon
Domains	Global Index, B-Tree
Last Updated	2026-02-08 00:00 GMT

Overview

BlockReader reads key-value entries from B-tree index blocks with support for aligned and unaligned block formats.

Description

BlockReader is an abstract base class for reading blocks in B-tree index files. A block is a contiguous chunk of key-value entries stored in a compact binary format. The reader supports two storage layouts: aligned blocks where all entries have fixed size, and unaligned blocks where entries have variable size and require an index for random access.

AlignedBlockReader handles blocks where each entry occupies exactly the same number of bytes. The block trailer stores the record size, allowing direct calculation of any entry's position by multiplying the record position by the record size. This provides efficient random access with minimal metadata overhead.

UnalignedBlockReader handles blocks with variable-size entries. It uses an index stored at the end of the block (before the trailer) that contains 4-byte offsets for each entry. The trailer stores the number of records, and the index length is calculated as record_count * 4 bytes. This approach trades some space overhead for flexibility in entry sizes.

Both reader types support iteration through BlockIterator, which provides sequential access and binary search capabilities. The iterator can seek to specific keys using the comparator function and the block's random access capabilities.

Usage

BlockReader is used internally by SstFileReader to read data and index blocks. The factory method BlockReader.create automatically selects the appropriate reader type based on the block trailer. Users typically interact with blocks through BlockIterator rather than directly with the reader.

Code Reference

Source Location

Repository: Apache_Paimon
File: paimon-python/pypaimon/globalindex/btree/block_reader.py

Signature

class BlockReader:
    """Reader for a block."""

    def __init__(
        self,
        block: bytes,
        record_count: int,
        comparator: Callable[[bytes, bytes], int]
    ):
        ...

    def block_input(self) -> MemorySliceInput:
        """Create a MemorySliceInput for this block."""
        ...

    def iterator(self) -> 'BlockIterator':
        """Create a BlockIterator for this reader."""
        ...

    def seek_to(self, record_position: int) -> int:
        """Seek to slice position from record position."""
        ...

    @staticmethod
    def create(
        block: bytes,
        comparator: Optional[Callable[[bytes, bytes], int]] = None
    ) -> 'BlockReader':
        """Create appropriate BlockReader based on block format."""
        ...


class AlignedBlockReader(BlockReader):
    """Block reader for aligned blocks (fixed record size)."""

    def __init__(
        self,
        data: bytes,
        record_size: int,
        comparator: Optional[Callable[[bytes, bytes], int]] = None
    ):
        ...


class UnalignedBlockReader(BlockReader):
    """Block reader for unaligned blocks (uses index)."""

    def __init__(
        self,
        data: bytes,
        index: bytes,
        comparator: Optional[Callable[[bytes, bytes], int]] = None
    ):
        ...


class BlockIterator:
    """Iterator for block entries."""

    def __iter__(self):
        ...

    def __next__(self) -> BlockEntry:
        ...

    def seek_to(self, target_key: bytes) -> bool:
        """Seek to first key >= target using binary search."""
        ...

Import

from pypaimon.globalindex.btree.block_reader import BlockReader, BlockIterator

I/O Contract

Inputs

Name	Type	Required	Description
block	bytes	Yes	Block data including trailer
comparator	Callable[[bytes, bytes], int]	No	Key comparison function

Outputs

Name	Type	Description
BlockReader	BlockReader	Aligned or Unaligned reader instance
BlockIterator	BlockIterator	Iterator over block entries
BlockEntry	BlockEntry	Key-value entry from block

Usage Examples

from pypaimon.globalindex.btree.block_reader import BlockReader

# Read block data (including trailer)
block_data = read_block_from_file(offset, size)

# Create appropriate reader based on block format
def key_comparator(key1: bytes, key2: bytes) -> int:
    # Compare keys as strings
    str1 = key1.decode('utf-8')
    str2 = key2.decode('utf-8')
    return (str1 > str2) - (str1 < str2)

reader = BlockReader.create(block_data, key_comparator)

# Iterate over all entries
iterator = reader.iterator()
for entry in iterator:
    print(f"Key: {entry.key}, Value: {entry.value}")

# Seek to specific key
iterator = reader.iterator()
target_key = b"search_key"
found = iterator.seek_to(target_key)

if found:
    print("Exact match found")
else:
    print("Positioned at first key >= target")

# Read remaining entries
while iterator.has_next():
    entry = next(iterator)
    process_entry(entry)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment