Implementation:Apache Paimon BTreeIndexReader
| Knowledge Sources | |
|---|---|
| Domains | Indexing, Query Optimization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
BTreeIndexReader implements the GlobalIndexReader interface for B-tree indexes, providing efficient predicate evaluation by querying SST-format B-tree index files to return matching row ID sets represented as RoaringBitmap64.
Description
On initialization, BTreeIndexReader reads the `BTreeIndexMeta` (containing min/max keys and null flag), parses the `BTreeFileFooter` to locate index blocks, bloom filter blocks, and null bitmap blocks, and creates an `SstFileReader` for navigating the B-tree structure. Each `visit_*` method (visit_equal, visit_less_than, visit_greater_than, visit_in, visit_between, visit_is_null, visit_is_not_null, etc.) returns a `GlobalIndexResult` wrapping a supplier function that performs the actual index query when invoked. The core operation is `_range_query()`, which creates an SstFileIterator, seeks to the lower bound key using binary search in index blocks, iterates through data blocks comparing keys with the upper bound, and collects matching row IDs from deserialized values into a RoaringBitmap64. Null handling is separate: nulls are stored in a dedicated null bitmap block (read lazily via `_read_null_bitmap()`), and `visit_is_null()` returns that bitmap while `visit_is_not_null()` returns all non-null rows via `_all_non_null_rows()`. String pattern predicates (starts_with, ends_with, contains) currently fall back to returning all non-null rows. The reader supports inclusive/exclusive bounds for range queries and uses the KeySerializer for key serialization/deserialization and comparison. CRC32 verification is performed when reading the null bitmap.
This implementation enables B-tree-based global index filtering in the Python SDK, allowing predicate pushdown to the index level which dramatically reduces the number of rows that need to be scanned from data files.
Usage
BTreeIndexReader is instantiated by the table scan layer when B-tree indexes are available for query predicates, typically not used directly by applications.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/globalindex/btree/btree_index_reader.py
Signature
class BTreeIndexReader(GlobalIndexReader):
FOOTER_ENCODED_LENGTH = 48
def __init__(self, key_serializer: KeySerializer, file_io: FileIO,
index_path: str, io_meta: GlobalIndexIOMeta): ...
def visit_equal(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
def visit_less_than(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
def visit_greater_than(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
def visit_in(self, field_ref: FieldRef, literals: List[object]) -> Optional[GlobalIndexResult]: ...
def visit_between(self, field_ref: FieldRef, min_v: object, max_v: object) -> Optional[GlobalIndexResult]: ...
def visit_is_null(self, field_ref: FieldRef) -> Optional[GlobalIndexResult]: ...
def visit_is_not_null(self, field_ref: FieldRef) -> Optional[GlobalIndexResult]: ...
def _range_query(self, from_key: object, to_key: object,
from_inclusive: bool, to_inclusive: bool) -> RoaringBitmap64: ...
def _read_null_bitmap(self) -> RoaringBitmap64: ...
def close(self) -> None: ...
Import
from pypaimon.globalindex.btree.btree_index_reader import BTreeIndexReader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| key_serializer | KeySerializer | yes | Serializer for index keys |
| file_io | FileIO | yes | File I/O abstraction |
| index_path | str | yes | Path to index directory |
| io_meta | GlobalIndexIOMeta | yes | Index file metadata (filename, size, schema_id) |
Outputs
| Name | Type | Description |
|---|---|---|
| GlobalIndexResult | GlobalIndexResult | Lazy supplier of RoaringBitmap64 with matching row IDs |
| RoaringBitmap64 | RoaringBitmap64 | Set of row IDs matching the predicate |
Usage Examples
Equality Predicate
from pypaimon.globalindex.btree.btree_index_reader import BTreeIndexReader
from pypaimon.globalindex.btree.key_serializer import KeySerializer
from pypaimon.common.file_io import LocalFileIO
# Initialize reader
key_serializer = KeySerializer([DataField(0, "user_id", AtomicType("INT"))])
file_io = LocalFileIO()
reader = BTreeIndexReader(
key_serializer=key_serializer,
file_io=file_io,
index_path="/path/to/table/index",
io_meta=GlobalIndexIOMeta(file_name="btree-001.idx", file_size=1024, schema_id=0)
)
# Query for user_id = 123
result = reader.visit_equal(FieldRef("user_id"), 123)
matching_row_ids = result.results() # RoaringBitmap64
print(f"Found {len(matching_row_ids)} rows")
Range Query
# Query for age >= 18 AND age <= 65
result_lower = reader.visit_greater_or_equal(FieldRef("age"), 18)
result_upper = reader.visit_less_or_equal(FieldRef("age"), 65)
# Intersect results
from pypaimon.utils.roaring_bitmap import RoaringBitmap64
row_ids = RoaringBitmap64.and_(result_lower.results(), result_upper.results())
In Predicate
# Query for status IN ('active', 'pending', 'verified')
result = reader.visit_in(
FieldRef("status"),
['active', 'pending', 'verified']
)
matching_row_ids = result.results()
Null Checks
# Find rows where email IS NOT NULL
result = reader.visit_is_not_null(FieldRef("email"))
non_null_row_ids = result.results()
# Find rows where email IS NULL
result_null = reader.visit_is_null(FieldRef("email"))
null_row_ids = result_null.results()