Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon VectorSearch Construction

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Vector_Search
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for creating vector similarity search query objects in Paimon.

Description

VectorSearch is a dataclass that holds the query vector, limit (top-K), field name, and optional include_row_ids bitmap. The vector is auto-converted to numpy float32 for FAISS compatibility. Validation ensures non-null vector, positive limit, and non-empty field name.

Key methods:

  • with_include_row_ids(): Creates a new VectorSearch with a pre-filter bitmap, enabling hybrid search where only rows in the bitmap are considered.
  • offset_range(): Scopes the search to a specific row range for sharded parallel execution. Creates a new query with adjusted internal offsets.
  • visit(): Delegates to GlobalIndexReader.visit_vector_search(), implementing the visitor pattern for search execution.

Usage

Use VectorSearch to construct vector similarity queries before passing them to GlobalIndexEvaluator or GlobalIndexScanBuilder for execution.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/globalindex/vector_search.py:L26-92

Signature

@dataclass
class VectorSearch:
    vector: Union[List[float], np.ndarray]
    limit: int
    field_name: str
    include_row_ids: Optional[RoaringBitmap64] = None

    def with_include_row_ids(self, include_row_ids: RoaringBitmap64) -> 'VectorSearch':
    def offset_range(self, from_: int, to: int) -> 'VectorSearch':
    def visit(self, visitor: GlobalIndexReader) -> Optional[GlobalIndexResult]:

Import

from pypaimon.globalindex.vector_search import VectorSearch

I/O Contract

Inputs

Name Type Required Description
vector Union[List[float], np.ndarray] Yes Query embedding vector, auto-converted to numpy float32
limit int Yes Number of nearest neighbors to return (top-K), must be positive
field_name str Yes Name of the vector column in the Paimon table to search against
include_row_ids Optional[RoaringBitmap64] No Pre-filter bitmap of candidate row IDs for hybrid search

Outputs

Name Type Description
VectorSearch dataclass Configured vector search query object
with_include_row_ids() VectorSearch New query with pre-filter bitmap applied
offset_range() VectorSearch New query scoped to a specific row range
visit() Optional[GlobalIndexResult] Search results with matching row IDs and similarity scores

Usage Examples

Basic Usage

import numpy as np
from pypaimon.globalindex.vector_search import VectorSearch

# Create query from a Python list
query = VectorSearch(
    vector=[0.1, 0.2, 0.3, 0.4],
    limit=10,
    field_name='embedding',
)

# Or from a numpy array
embedding = np.random.randn(768).astype(np.float32)
query = VectorSearch(
    vector=embedding,
    limit=20,
    field_name='embedding',
)

Hybrid Search with Pre-filtering

from pyroaring import BitMap64 as RoaringBitmap64
from pypaimon.globalindex.vector_search import VectorSearch

# Create base vector query
query = VectorSearch(
    vector=[0.1, 0.2, 0.3, 0.4],
    limit=10,
    field_name='embedding',
)

# Apply pre-filter from predicate evaluation
candidate_rows = RoaringBitmap64([1, 5, 10, 20, 50, 100])
filtered_query = query.with_include_row_ids(candidate_rows)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment