Implementation:Apache Paimon BlockReader
| Knowledge Sources | |
|---|---|
| Domains | Global Index, B-Tree |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
BlockReader reads key-value entries from B-tree index blocks with support for aligned and unaligned block formats.
Description
BlockReader is an abstract base class for reading blocks in B-tree index files. A block is a contiguous chunk of key-value entries stored in a compact binary format. The reader supports two storage layouts: aligned blocks where all entries have fixed size, and unaligned blocks where entries have variable size and require an index for random access.
AlignedBlockReader handles blocks where each entry occupies exactly the same number of bytes. The block trailer stores the record size, allowing direct calculation of any entry's position by multiplying the record position by the record size. This provides efficient random access with minimal metadata overhead.
UnalignedBlockReader handles blocks with variable-size entries. It uses an index stored at the end of the block (before the trailer) that contains 4-byte offsets for each entry. The trailer stores the number of records, and the index length is calculated as record_count * 4 bytes. This approach trades some space overhead for flexibility in entry sizes.
Both reader types support iteration through BlockIterator, which provides sequential access and binary search capabilities. The iterator can seek to specific keys using the comparator function and the block's random access capabilities.
Usage
BlockReader is used internally by SstFileReader to read data and index blocks. The factory method BlockReader.create automatically selects the appropriate reader type based on the block trailer. Users typically interact with blocks through BlockIterator rather than directly with the reader.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/globalindex/btree/block_reader.py
Signature
class BlockReader:
"""Reader for a block."""
def __init__(
self,
block: bytes,
record_count: int,
comparator: Callable[[bytes, bytes], int]
):
...
def block_input(self) -> MemorySliceInput:
"""Create a MemorySliceInput for this block."""
...
def iterator(self) -> 'BlockIterator':
"""Create a BlockIterator for this reader."""
...
def seek_to(self, record_position: int) -> int:
"""Seek to slice position from record position."""
...
@staticmethod
def create(
block: bytes,
comparator: Optional[Callable[[bytes, bytes], int]] = None
) -> 'BlockReader':
"""Create appropriate BlockReader based on block format."""
...
class AlignedBlockReader(BlockReader):
"""Block reader for aligned blocks (fixed record size)."""
def __init__(
self,
data: bytes,
record_size: int,
comparator: Optional[Callable[[bytes, bytes], int]] = None
):
...
class UnalignedBlockReader(BlockReader):
"""Block reader for unaligned blocks (uses index)."""
def __init__(
self,
data: bytes,
index: bytes,
comparator: Optional[Callable[[bytes, bytes], int]] = None
):
...
class BlockIterator:
"""Iterator for block entries."""
def __iter__(self):
...
def __next__(self) -> BlockEntry:
...
def seek_to(self, target_key: bytes) -> bool:
"""Seek to first key >= target using binary search."""
...
Import
from pypaimon.globalindex.btree.block_reader import BlockReader, BlockIterator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| block | bytes | Yes | Block data including trailer |
| comparator | Callable[[bytes, bytes], int] | No | Key comparison function |
Outputs
| Name | Type | Description |
|---|---|---|
| BlockReader | BlockReader | Aligned or Unaligned reader instance |
| BlockIterator | BlockIterator | Iterator over block entries |
| BlockEntry | BlockEntry | Key-value entry from block |
Usage Examples
from pypaimon.globalindex.btree.block_reader import BlockReader
# Read block data (including trailer)
block_data = read_block_from_file(offset, size)
# Create appropriate reader based on block format
def key_comparator(key1: bytes, key2: bytes) -> int:
# Compare keys as strings
str1 = key1.decode('utf-8')
str2 = key2.decode('utf-8')
return (str1 > str2) - (str1 < str2)
reader = BlockReader.create(block_data, key_comparator)
# Iterate over all entries
iterator = reader.iterator()
for entry in iterator:
print(f"Key: {entry.key}, Value: {entry.value}")
# Seek to specific key
iterator = reader.iterator()
target_key = b"search_key"
found = iterator.seek_to(target_key)
if found:
print("Exact match found")
else:
print("Positioned at first key >= target")
# Read remaining entries
while iterator.has_next():
entry = next(iterator)
process_entry(entry)