Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon FaissVectorIndexOptions Configuration

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Vector_Search
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for configuring FAISS vector index parameters in Paimon tables.

Description

FaissVectorIndexOptions is a dataclass that encapsulates all FAISS index configuration. It supports 5 index types (FLAT, HNSW, IVF, IVF_PQ, IVF_SQ8), 2 distance metrics (L2, INNER_PRODUCT), and parameters for each index type. The from_options() class method creates the configuration from a dictionary of table options. The options are stored as table properties with the vector. prefix.

Supporting enums include:

  • FaissVectorMetric: Defines the distance metric used for similarity computation (L2 for Euclidean distance, INNER_PRODUCT for dot product / cosine similarity).
  • FaissIndexType: Defines the ANN algorithm used for indexing (FLAT, HNSW, IVF, IVF_PQ, IVF_SQ8).

The dataclass provides sensible defaults for all parameters, making it easy to get started while allowing fine-grained control for production workloads. The to_dict() method serializes the configuration back to a dictionary with vector. prefixed keys for storage as table properties.

Usage

Use FaissVectorIndexOptions when configuring vector indexes on Paimon tables. The configuration can be created either directly via the constructor or from a dictionary of table options using the from_options() class method.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/globalindex/faiss/faiss_options.py:L26-120

Signature

class FaissVectorMetric(Enum):
    L2 = "L2"
    INNER_PRODUCT = "INNER_PRODUCT"

class FaissIndexType(Enum):
    FLAT = "FLAT"
    HNSW = "HNSW"
    IVF = "IVF"
    IVF_PQ = "IVF_PQ"
    IVF_SQ8 = "IVF_SQ8"

@dataclass
class FaissVectorIndexOptions:
    dimension: int = 128
    metric: FaissVectorMetric = FaissVectorMetric.L2
    index_type: FaissIndexType = FaissIndexType.IVF_SQ8
    m: int = 32                    # HNSW connections per layer
    ef_construction: int = 40      # HNSW construction parameter
    ef_search: int = 16            # HNSW search parameter
    nlist: int = 100               # IVF cluster count
    nprobe: int = 64               # IVF search breadth
    pq_m: int = 8                  # PQ sub-quantizers
    pq_nbits: int = 8             # PQ bits per sub-quantizer
    size_per_index: int = 2000000  # Vectors per index shard
    training_size: int = 500000    # Vectors for IVF training
    search_factor: int = 10        # Search multiplier for filtering
    normalize: bool = False        # L2 normalize vectors

    @classmethod
    def from_options(cls, options: Dict[str, Any]) -> 'FaissVectorIndexOptions':

    def to_dict(self) -> Dict[str, Any]:

Import

from pypaimon.globalindex.faiss.faiss_options import (
    FaissVectorIndexOptions, FaissVectorMetric, FaissIndexType
)

I/O Contract

Inputs

Name Type Required Description
options Dict[str, Any] Yes (for from_options) Dictionary with 'vector.' prefixed keys (e.g., 'vector.dim', 'vector.metric')
dimension int No (default 128) Dimensionality of the embedding vectors
metric FaissVectorMetric No (default L2) Distance metric for similarity computation (L2 or INNER_PRODUCT)
index_type FaissIndexType No (default IVF_SQ8) ANN index algorithm to use
m int No (default 32) HNSW: number of connections per layer
ef_construction int No (default 40) HNSW: construction-time search depth
ef_search int No (default 16) HNSW: query-time search depth
nlist int No (default 100) IVF: number of Voronoi cells (clusters)
nprobe int No (default 64) IVF: number of cells to search at query time
pq_m int No (default 8) IVF_PQ: number of sub-quantizers
pq_nbits int No (default 8) IVF_PQ: bits per sub-quantizer
size_per_index int No (default 2000000) Maximum vectors per index shard
training_size int No (default 500000) Number of vectors used for IVF training
search_factor int No (default 10) Multiplier applied to limit for pre-filtering scenarios
normalize bool No (default False) Whether to L2 normalize vectors before indexing

Outputs

Name Type Description
FaissVectorIndexOptions dataclass Fully configured FAISS index options instance
to_dict() Dict[str, Any] Serialized options with 'vector.' prefixed keys for table property storage

Usage Examples

Basic Usage

from pypaimon.globalindex.faiss.faiss_options import (
    FaissVectorIndexOptions, FaissVectorMetric, FaissIndexType
)

# Configure from table options dictionary
options = {
    'vector.dim': 768,
    'vector.metric': 'INNER_PRODUCT',
    'vector.index-type': 'HNSW',
    'vector.ef-search': 64,
    'vector.m': 32,
}
faiss_options = FaissVectorIndexOptions.from_options(options)

# Or create directly with constructor
faiss_options = FaissVectorIndexOptions(
    dimension=768,
    metric=FaissVectorMetric.INNER_PRODUCT,
    index_type=FaissIndexType.HNSW,
    ef_search=64,
)

# Serialize back to table properties
props = faiss_options.to_dict()
# {'vector.dim': 768, 'vector.metric': 'INNER_PRODUCT', ...}

IVF_PQ Configuration for Large Datasets

# Configure IVF_PQ for billion-scale dataset with memory constraints
faiss_options = FaissVectorIndexOptions(
    dimension=256,
    metric=FaissVectorMetric.L2,
    index_type=FaissIndexType.IVF_PQ,
    nlist=1024,
    nprobe=32,
    pq_m=16,
    pq_nbits=8,
    size_per_index=5000000,
    training_size=1000000,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment