Principle:Apache Paimon Index Evaluation and Scoring

Knowledge Sources	Apache Paimon FAISS Wiki
Domains	Data_Lake, Vector_Search
Last Updated	2026-02-07 00:00 GMT

Overview

Mechanism for evaluating global indexes using predicates and vector queries to produce scored result sets with matching row IDs.

Description

The GlobalIndexEvaluator orchestrates index evaluation by routing predicates and vector queries to the appropriate GlobalIndexReader implementations. The evaluation process handles two distinct query types and their combination:

Predicate Evaluation:

Walks the predicate tree (AND/OR/leaf nodes) recursively.
Visits each leaf predicate against available index readers for the referenced field.
Combines results from multiple readers for the same field using and_() (intersection).
Combines AND nodes with intersection and OR nodes with union on RoaringBitmap row ID sets.

Vector Search Evaluation:

Delegates to FaissVectorGlobalIndexReader which performs the actual FAISS ANN search.
Returns ScoredGlobalIndexResult with both row IDs and per-row similarity scores.

Hybrid Evaluation (Predicate + Vector Search):

When both predicates and vector search are present, predicate results are evaluated first.
Predicate results are used to pre-filter the vector search candidates via include_row_ids.
This reduces the vector search space, improving both performance and relevance.

Usage

Use when executing hybrid queries that combine predicate filters with vector similarity search, or when evaluating standalone predicates/vector queries against global indexes.

The evaluator is typically used internally by RowRangeGlobalIndexScanner, but can be used directly for custom evaluation pipelines:

Construct the evaluator with table fields and a reader factory function.
Call evaluate() with optional predicate and vector search parameters.
Process the returned GlobalIndexResult (or ScoredGlobalIndexResult for vector queries).
Call close() to release index reader resources.

Theoretical Basis

Visitor Pattern: The evaluator implements a visitor pattern over the predicate/query tree. Each index reader specializes in one type of query: B-tree readers handle equality and range predicates, FAISS readers handle vector similarity. The visitor pattern decouples predicate tree traversal from index-specific evaluation logic.

Set-Based Result Combination: Results are combined using set operations on RoaringBitmap row ID sets:

AND (Intersection): For conjunctive predicates, the intersection of result sets ensures all conditions are met. This is efficient because RoaringBitmap intersection operates in O(min(n,m)) time.
OR (Union): For disjunctive predicates, the union of result sets includes rows matching any condition. RoaringBitmap union operates in O(n+m) time.

Pre-filtering for Hybrid Search: Pre-filtering vector search with predicate results reduces the search space. This is more efficient than post-filtering because:

The FAISS index can skip vectors not in the candidate set (when supported by the index type).
The search_factor parameter in FAISS options multiplies the limit to account for candidates that may be filtered out, ensuring sufficient results after filtering.

Reader Caching: The evaluator caches GlobalIndexReader instances per field for efficient reuse across multiple predicate leaves referencing the same field. This avoids repeatedly loading index files from storage.

Related Pages

Implemented By

Implementation:Apache_Paimon_GlobalIndexEvaluator_Evaluate

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment