Implementation:Apache Paimon VectorSearch Construction
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Vector_Search |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for creating vector similarity search query objects in Paimon.
Description
VectorSearch is a dataclass that holds the query vector, limit (top-K), field name, and optional include_row_ids bitmap. The vector is auto-converted to numpy float32 for FAISS compatibility. Validation ensures non-null vector, positive limit, and non-empty field name.
Key methods:
- with_include_row_ids(): Creates a new VectorSearch with a pre-filter bitmap, enabling hybrid search where only rows in the bitmap are considered.
- offset_range(): Scopes the search to a specific row range for sharded parallel execution. Creates a new query with adjusted internal offsets.
- visit(): Delegates to GlobalIndexReader.visit_vector_search(), implementing the visitor pattern for search execution.
Usage
Use VectorSearch to construct vector similarity queries before passing them to GlobalIndexEvaluator or GlobalIndexScanBuilder for execution.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/globalindex/vector_search.py:L26-92
Signature
@dataclass
class VectorSearch:
vector: Union[List[float], np.ndarray]
limit: int
field_name: str
include_row_ids: Optional[RoaringBitmap64] = None
def with_include_row_ids(self, include_row_ids: RoaringBitmap64) -> 'VectorSearch':
def offset_range(self, from_: int, to: int) -> 'VectorSearch':
def visit(self, visitor: GlobalIndexReader) -> Optional[GlobalIndexResult]:
Import
from pypaimon.globalindex.vector_search import VectorSearch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vector | Union[List[float], np.ndarray] | Yes | Query embedding vector, auto-converted to numpy float32 |
| limit | int | Yes | Number of nearest neighbors to return (top-K), must be positive |
| field_name | str | Yes | Name of the vector column in the Paimon table to search against |
| include_row_ids | Optional[RoaringBitmap64] | No | Pre-filter bitmap of candidate row IDs for hybrid search |
Outputs
| Name | Type | Description |
|---|---|---|
| VectorSearch | dataclass | Configured vector search query object |
| with_include_row_ids() | VectorSearch | New query with pre-filter bitmap applied |
| offset_range() | VectorSearch | New query scoped to a specific row range |
| visit() | Optional[GlobalIndexResult] | Search results with matching row IDs and similarity scores |
Usage Examples
Basic Usage
import numpy as np
from pypaimon.globalindex.vector_search import VectorSearch
# Create query from a Python list
query = VectorSearch(
vector=[0.1, 0.2, 0.3, 0.4],
limit=10,
field_name='embedding',
)
# Or from a numpy array
embedding = np.random.randn(768).astype(np.float32)
query = VectorSearch(
vector=embedding,
limit=20,
field_name='embedding',
)
Hybrid Search with Pre-filtering
from pyroaring import BitMap64 as RoaringBitmap64
from pypaimon.globalindex.vector_search import VectorSearch
# Create base vector query
query = VectorSearch(
vector=[0.1, 0.2, 0.3, 0.4],
limit=10,
field_name='embedding',
)
# Apply pre-filter from predicate evaluation
candidate_rows = RoaringBitmap64([1, 5, 10, 20, 50, 100])
filtered_query = query.with_include_row_ids(candidate_rows)
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment