Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon GlobalIndexEvaluator Evaluate

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Vector_Search
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for evaluating global indexes with predicates and vector queries to find matching rows.

Description

GlobalIndexEvaluator.evaluate() processes optional Predicate and VectorSearch objects. The evaluation follows this logic:

  • For predicates: Recursively visits AND/OR/leaf nodes, delegating leaf evaluation to cached GlobalIndexReader instances via the visitor pattern. AND nodes produce intersection of child results; OR nodes produce union. Leaf nodes are evaluated by all available readers for the referenced field, with results combined via and_() (intersection).
  • For vector search: Obtains readers for the vector field and calls visit_vector_search() on each. Results from multiple readers are combined via and_() (intersection).
  • For hybrid queries: Predicate evaluation runs first, then its results are passed as include_row_ids to the vector search, pre-filtering the candidate set.

FaissVectorGlobalIndexReader internally loads FAISS indexes from storage and performs the actual ANN search. It manages multiple index shards (one per size_per_index vectors), loading each shard on demand and caching the loaded FAISS index objects.

The evaluator caches readers per field for efficient reuse. The close() method releases all cached reader resources.

Usage

The evaluator is typically created internally by RowRangeGlobalIndexScanner rather than constructed directly. However, direct usage is supported for custom evaluation pipelines.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/globalindex/global_index_evaluator.py:L30-209
  • File: paimon-python/pypaimon/globalindex/faiss/faiss_vector_reader.py:L44-310

Signature

class GlobalIndexEvaluator:
    def __init__(
        self,
        fields: List[DataField],
        readers_function: Callable[[DataField], Collection[GlobalIndexReader]],
    ):

    def evaluate(
        self,
        predicate: Optional[Predicate],
        vector_search: Optional[VectorSearch],
    ) -> Optional[GlobalIndexResult]:

    def close(self) -> None:

Import

from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator

I/O Contract

Inputs

Name Type Required Description
fields List[DataField] Yes (constructor) Table field definitions used to resolve field names to index readers
readers_function Callable[[DataField], Collection[GlobalIndexReader]] Yes (constructor) Factory function that returns index readers for a given field
predicate Optional[Predicate] No Filter predicate tree (AND/OR/leaf nodes) for structured filtering
vector_search Optional[VectorSearch] No Vector similarity query for ANN search

Outputs

Name Type Description
evaluate() Optional[GlobalIndexResult] Result containing RoaringBitmap64 of matching row IDs; returns None if no index is applicable
evaluate() (vector) Optional[ScoredGlobalIndexResult] Extended result variant that includes per-row similarity scores for vector queries

Usage Examples

Basic Usage

from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator
from pypaimon.globalindex.vector_search import VectorSearch

# Evaluator is typically created internally by RowRangeGlobalIndexScanner
# Direct usage:
evaluator = GlobalIndexEvaluator(
    fields=table.fields,
    readers_function=my_readers_fn,
)

# Vector-only search
query = VectorSearch(
    vector=[0.1, 0.2, 0.3],
    limit=10,
    field_name='embedding',
)
result = evaluator.evaluate(predicate=None, vector_search=query)

if result is not None:
    print(f"Found {result.results().get_count()} matching rows")

# Always close to release resources
evaluator.close()

Hybrid Search (Predicate + Vector)

from pypaimon.globalindex.global_index_evaluator import GlobalIndexEvaluator
from pypaimon.globalindex.vector_search import VectorSearch

evaluator = GlobalIndexEvaluator(
    fields=table.fields,
    readers_function=my_readers_fn,
)

# Hybrid query: predicate filters first, then vector search on filtered set
query = VectorSearch(
    vector=embedding_vector,
    limit=20,
    field_name='embedding',
)

result = evaluator.evaluate(
    predicate=category_filter,
    vector_search=query,
)

if result is not None:
    row_ids = result.results()
    print(f"Hybrid search matched {row_ids.get_count()} rows")

evaluator.close()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment