Implementation:Apache Paimon GlobalIndexEvaluator Evaluate
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Vector_Search |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for evaluating global indexes with predicates and vector queries to find matching rows.
Description
GlobalIndexEvaluator.evaluate() processes optional Predicate and VectorSearch objects. The evaluation follows this logic:
- For predicates: Recursively visits AND/OR/leaf nodes, delegating leaf evaluation to cached GlobalIndexReader instances via the visitor pattern. AND nodes produce intersection of child results; OR nodes produce union. Leaf nodes are evaluated by all available readers for the referenced field, with results combined via and_() (intersection).
- For vector search: Obtains readers for the vector field and calls visit_vector_search() on each. Results from multiple readers are combined via and_() (intersection).
- For hybrid queries: Predicate evaluation runs first, then its results are passed as include_row_ids to the vector search, pre-filtering the candidate set.
FaissVectorGlobalIndexReader internally loads FAISS indexes from storage and performs the actual ANN search. It manages multiple index shards (one per size_per_index vectors), loading each shard on demand and caching the loaded FAISS index objects.
The evaluator caches readers per field for efficient reuse. The close() method releases all cached reader resources.
Usage
The evaluator is typically created internally by RowRangeGlobalIndexScanner rather than constructed directly. However, direct usage is supported for custom evaluation pipelines.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/globalindex/global_index_evaluator.py:L30-209
- File: paimon-python/pypaimon/globalindex/faiss/faiss_vector_reader.py:L44-310
Signature
class GlobalIndexEvaluator:
def __init__(
self,
fields: List[DataField],
readers_function: Callable[[DataField], Collection[GlobalIndexReader]],
):
def evaluate(
self,
predicate: Optional[Predicate],
vector_search: Optional[VectorSearch],
) -> Optional[GlobalIndexResult]:
def close(self) -> None:
Import
from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| fields | List[DataField] | Yes (constructor) | Table field definitions used to resolve field names to index readers |
| readers_function | Callable[[DataField], Collection[GlobalIndexReader]] | Yes (constructor) | Factory function that returns index readers for a given field |
| predicate | Optional[Predicate] | No | Filter predicate tree (AND/OR/leaf nodes) for structured filtering |
| vector_search | Optional[VectorSearch] | No | Vector similarity query for ANN search |
Outputs
| Name | Type | Description |
|---|---|---|
| evaluate() | Optional[GlobalIndexResult] | Result containing RoaringBitmap64 of matching row IDs; returns None if no index is applicable |
| evaluate() (vector) | Optional[ScoredGlobalIndexResult] | Extended result variant that includes per-row similarity scores for vector queries |
Usage Examples
Basic Usage
from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator
from pypaimon.globalindex.vector_search import VectorSearch
# Evaluator is typically created internally by RowRangeGlobalIndexScanner
# Direct usage:
evaluator = GlobalIndexEvaluator(
fields=table.fields,
readers_function=my_readers_fn,
)
# Vector-only search
query = VectorSearch(
vector=[0.1, 0.2, 0.3],
limit=10,
field_name='embedding',
)
result = evaluator.evaluate(predicate=None, vector_search=query)
if result is not None:
print(f"Found {result.results().get_count()} matching rows")
# Always close to release resources
evaluator.close()
Hybrid Search (Predicate + Vector)
from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator
from pypaimon.globalindex.vector_search import VectorSearch
evaluator = GlobalIndexEvaluator(
fields=table.fields,
readers_function=my_readers_fn,
)
# Hybrid query: predicate filters first, then vector search on filtered set
query = VectorSearch(
vector=embedding_vector,
limit=20,
field_name='embedding',
)
result = evaluator.evaluate(
predicate=category_filter,
vector_search=query,
)
if result is not None:
row_ids = result.results()
print(f"Hybrid search matched {row_ids.get_count()} rows")
evaluator.close()