Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Java IvfBuildParams

From Leeroopedia


Knowledge Sources
Domains Java_SDK, Indexing
Last Updated 2026-02-08 19:33 GMT

Overview

Description

IvfBuildParams is a Java class in the org.lance.index.vector package that defines parameters for building an IVF (Inverted File Index) for vector search. IVF training runs k-means clustering on the given vector column to determine centroids that partition vectors into different clusters. This is the first step in several vector index types (IVF_FLAT, IVF_PQ, IVF_SQ, IVF_HNSW_*). The class is immutable and uses a Builder pattern with sensible defaults for all parameters. It also supports providing pre-trained centroids for distributed index build workflows.

Usage

IvfBuildParams is a required component of VectorIndexParams. It controls how many partitions are created, how centroids are trained (iterations, sample rate), and how the shuffle phase operates (batch size, concurrency). For advanced distributed workflows, pre-trained centroids can be set via setCentroids(), typically obtained from VectorTrainer.trainIvfCentroids().

Code Reference

Source Location

java/src/main/java/org/lance/index/vector/IvfBuildParams.java

Signature

public class IvfBuildParams {
    public int getNumPartitions();
    public int getMaxIters();
    public int getSampleRate();
    public int getShufflePartitionBatches();
    public int getShufflePartitionConcurrency();
    public boolean useResidual();
    public float[] getCentroids();

    public static class Builder {
        public Builder();
        public Builder setNumPartitions(int numPartitions);
        public Builder setMaxIters(int maxIters);
        public Builder setSampleRate(int sampleRate);
        public Builder setShufflePartitionBatches(int shufflePartitionBatches);
        public Builder setShufflePartitionConcurrency(int shufflePartitionConcurrency);
        public Builder setUseResidual(boolean useResidual);
        public Builder setCentroids(float[] centroids);
        public IvfBuildParams build();
    }
}

Import

import org.lance.index.vector.IvfBuildParams;

I/O Contract

Builder Inputs
Parameter Type Required Default Description
numPartitions int No 32 Number of IVF partitions (k-means clusters)
maxIters int No 50 Maximum iterations for k-means clustering
sampleRate int No 256 Sample rate for training IVF centroids from the dataset
shufflePartitionBatches int No 10240 Number of batches per shuffle partition; smaller values reduce memory but increase build time
shufflePartitionConcurrency int No 2 Number of shuffle partitions processed concurrently
useResidual boolean No true Whether to use residual vectors for k-means clustering
centroids float[] No null Pre-trained centroids flattened as [numPartitions][dimension]
Accessor Outputs
Method Return Type Description
getNumPartitions() int Number of IVF partitions
getMaxIters() int Max k-means iterations
getSampleRate() int Training sample rate
getShufflePartitionBatches() int Batches per shuffle partition
getShufflePartitionConcurrency() int Concurrent shuffle partitions
useResidual() boolean Whether residual is used
getCentroids() float[] Pre-trained centroids (null if not set)

Usage Examples

import org.lance.index.vector.IvfBuildParams;
import org.lance.index.vector.VectorIndexParams;
import org.lance.index.DistanceType;

// Create IVF params with default settings (32 partitions)
IvfBuildParams defaultIvf = new IvfBuildParams.Builder().build();

// Create IVF params with custom partition count and memory tuning
IvfBuildParams customIvf = new IvfBuildParams.Builder()
    .setNumPartitions(256)
    .setMaxIters(100)
    .setSampleRate(512)
    .setShufflePartitionBatches(5120)
    .setShufflePartitionConcurrency(4)
    .build();

// Use pre-trained centroids from VectorTrainer
float[] centroids = VectorTrainer.trainIvfCentroids(dataset, "embedding", customIvf);
IvfBuildParams withCentroids = new IvfBuildParams.Builder()
    .setNumPartitions(256)
    .setCentroids(centroids)
    .build();

// Use with VectorIndexParams
VectorIndexParams vectorParams = VectorIndexParams.ivfFlat(256, DistanceType.L2);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment