Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon BTreeIndexReader

From Leeroopedia


Knowledge Sources
Domains Indexing, Query Optimization
Last Updated 2026-02-08 00:00 GMT

Overview

BTreeIndexReader implements the GlobalIndexReader interface for B-tree indexes, providing efficient predicate evaluation by querying SST-format B-tree index files to return matching row ID sets represented as RoaringBitmap64.

Description

On initialization, BTreeIndexReader reads the `BTreeIndexMeta` (containing min/max keys and null flag), parses the `BTreeFileFooter` to locate index blocks, bloom filter blocks, and null bitmap blocks, and creates an `SstFileReader` for navigating the B-tree structure. Each `visit_*` method (visit_equal, visit_less_than, visit_greater_than, visit_in, visit_between, visit_is_null, visit_is_not_null, etc.) returns a `GlobalIndexResult` wrapping a supplier function that performs the actual index query when invoked. The core operation is `_range_query()`, which creates an SstFileIterator, seeks to the lower bound key using binary search in index blocks, iterates through data blocks comparing keys with the upper bound, and collects matching row IDs from deserialized values into a RoaringBitmap64. Null handling is separate: nulls are stored in a dedicated null bitmap block (read lazily via `_read_null_bitmap()`), and `visit_is_null()` returns that bitmap while `visit_is_not_null()` returns all non-null rows via `_all_non_null_rows()`. String pattern predicates (starts_with, ends_with, contains) currently fall back to returning all non-null rows. The reader supports inclusive/exclusive bounds for range queries and uses the KeySerializer for key serialization/deserialization and comparison. CRC32 verification is performed when reading the null bitmap.

This implementation enables B-tree-based global index filtering in the Python SDK, allowing predicate pushdown to the index level which dramatically reduces the number of rows that need to be scanned from data files.

Usage

BTreeIndexReader is instantiated by the table scan layer when B-tree indexes are available for query predicates, typically not used directly by applications.

Code Reference

Source Location

Signature

class BTreeIndexReader(GlobalIndexReader):
    FOOTER_ENCODED_LENGTH = 48

    def __init__(self, key_serializer: KeySerializer, file_io: FileIO,
                 index_path: str, io_meta: GlobalIndexIOMeta): ...

    def visit_equal(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
    def visit_less_than(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
    def visit_greater_than(self, field_ref: FieldRef, literal: object) -> Optional[GlobalIndexResult]: ...
    def visit_in(self, field_ref: FieldRef, literals: List[object]) -> Optional[GlobalIndexResult]: ...
    def visit_between(self, field_ref: FieldRef, min_v: object, max_v: object) -> Optional[GlobalIndexResult]: ...
    def visit_is_null(self, field_ref: FieldRef) -> Optional[GlobalIndexResult]: ...
    def visit_is_not_null(self, field_ref: FieldRef) -> Optional[GlobalIndexResult]: ...

    def _range_query(self, from_key: object, to_key: object,
                     from_inclusive: bool, to_inclusive: bool) -> RoaringBitmap64: ...
    def _read_null_bitmap(self) -> RoaringBitmap64: ...
    def close(self) -> None: ...

Import

from pypaimon.globalindex.btree.btree_index_reader import BTreeIndexReader

I/O Contract

Inputs

Name Type Required Description
key_serializer KeySerializer yes Serializer for index keys
file_io FileIO yes File I/O abstraction
index_path str yes Path to index directory
io_meta GlobalIndexIOMeta yes Index file metadata (filename, size, schema_id)

Outputs

Name Type Description
GlobalIndexResult GlobalIndexResult Lazy supplier of RoaringBitmap64 with matching row IDs
RoaringBitmap64 RoaringBitmap64 Set of row IDs matching the predicate

Usage Examples

Equality Predicate

from pypaimon.globalindex.btree.btree_index_reader import BTreeIndexReader
from pypaimon.globalindex.btree.key_serializer import KeySerializer
from pypaimon.common.file_io import LocalFileIO

# Initialize reader
key_serializer = KeySerializer([DataField(0, "user_id", AtomicType("INT"))])
file_io = LocalFileIO()
reader = BTreeIndexReader(
    key_serializer=key_serializer,
    file_io=file_io,
    index_path="/path/to/table/index",
    io_meta=GlobalIndexIOMeta(file_name="btree-001.idx", file_size=1024, schema_id=0)
)

# Query for user_id = 123
result = reader.visit_equal(FieldRef("user_id"), 123)
matching_row_ids = result.results()  # RoaringBitmap64
print(f"Found {len(matching_row_ids)} rows")

Range Query

# Query for age >= 18 AND age <= 65
result_lower = reader.visit_greater_or_equal(FieldRef("age"), 18)
result_upper = reader.visit_less_or_equal(FieldRef("age"), 65)

# Intersect results
from pypaimon.utils.roaring_bitmap import RoaringBitmap64
row_ids = RoaringBitmap64.and_(result_lower.results(), result_upper.results())

In Predicate

# Query for status IN ('active', 'pending', 'verified')
result = reader.visit_in(
    FieldRef("status"),
    ['active', 'pending', 'verified']
)
matching_row_ids = result.results()

Null Checks

# Find rows where email IS NOT NULL
result = reader.visit_is_not_null(FieldRef("email"))
non_null_row_ids = result.results()

# Find rows where email IS NULL
result_null = reader.visit_is_null(FieldRef("email"))
null_row_ids = result_null.results()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment