Implementation:Lance format Lance Java IvfBuildParams
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Indexing |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
IvfBuildParams is a Java class in the org.lance.index.vector package that defines parameters for building an IVF (Inverted File Index) for vector search. IVF training runs k-means clustering on the given vector column to determine centroids that partition vectors into different clusters. This is the first step in several vector index types (IVF_FLAT, IVF_PQ, IVF_SQ, IVF_HNSW_*). The class is immutable and uses a Builder pattern with sensible defaults for all parameters. It also supports providing pre-trained centroids for distributed index build workflows.
Usage
IvfBuildParams is a required component of VectorIndexParams. It controls how many partitions are created, how centroids are trained (iterations, sample rate), and how the shuffle phase operates (batch size, concurrency). For advanced distributed workflows, pre-trained centroids can be set via setCentroids(), typically obtained from VectorTrainer.trainIvfCentroids().
Code Reference
Source Location
java/src/main/java/org/lance/index/vector/IvfBuildParams.java
Signature
public class IvfBuildParams {
public int getNumPartitions();
public int getMaxIters();
public int getSampleRate();
public int getShufflePartitionBatches();
public int getShufflePartitionConcurrency();
public boolean useResidual();
public float[] getCentroids();
public static class Builder {
public Builder();
public Builder setNumPartitions(int numPartitions);
public Builder setMaxIters(int maxIters);
public Builder setSampleRate(int sampleRate);
public Builder setShufflePartitionBatches(int shufflePartitionBatches);
public Builder setShufflePartitionConcurrency(int shufflePartitionConcurrency);
public Builder setUseResidual(boolean useResidual);
public Builder setCentroids(float[] centroids);
public IvfBuildParams build();
}
}
Import
import org.lance.index.vector.IvfBuildParams;
I/O Contract
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
numPartitions |
int |
No | 32 |
Number of IVF partitions (k-means clusters) |
maxIters |
int |
No | 50 |
Maximum iterations for k-means clustering |
sampleRate |
int |
No | 256 |
Sample rate for training IVF centroids from the dataset |
shufflePartitionBatches |
int |
No | 10240 |
Number of batches per shuffle partition; smaller values reduce memory but increase build time |
shufflePartitionConcurrency |
int |
No | 2 |
Number of shuffle partitions processed concurrently |
useResidual |
boolean |
No | true |
Whether to use residual vectors for k-means clustering |
centroids |
float[] |
No | null |
Pre-trained centroids flattened as [numPartitions][dimension] |
| Method | Return Type | Description |
|---|---|---|
getNumPartitions() |
int |
Number of IVF partitions |
getMaxIters() |
int |
Max k-means iterations |
getSampleRate() |
int |
Training sample rate |
getShufflePartitionBatches() |
int |
Batches per shuffle partition |
getShufflePartitionConcurrency() |
int |
Concurrent shuffle partitions |
useResidual() |
boolean |
Whether residual is used |
getCentroids() |
float[] |
Pre-trained centroids (null if not set) |
Usage Examples
import org.lance.index.vector.IvfBuildParams;
import org.lance.index.vector.VectorIndexParams;
import org.lance.index.DistanceType;
// Create IVF params with default settings (32 partitions)
IvfBuildParams defaultIvf = new IvfBuildParams.Builder().build();
// Create IVF params with custom partition count and memory tuning
IvfBuildParams customIvf = new IvfBuildParams.Builder()
.setNumPartitions(256)
.setMaxIters(100)
.setSampleRate(512)
.setShufflePartitionBatches(5120)
.setShufflePartitionConcurrency(4)
.build();
// Use pre-trained centroids from VectorTrainer
float[] centroids = VectorTrainer.trainIvfCentroids(dataset, "embedding", customIvf);
IvfBuildParams withCentroids = new IvfBuildParams.Builder()
.setNumPartitions(256)
.setCentroids(centroids)
.build();
// Use with VectorIndexParams
VectorIndexParams vectorParams = VectorIndexParams.ivfFlat(256, DistanceType.L2);
Related Pages
- VectorIndexParams - Requires
IvfBuildParamsas a mandatory component - VectorTrainer - Can pre-train IVF centroids for distributed builds
- PQBuildParams - Product quantization used with IVF
- HnswBuildParams - HNSW graph built within IVF partitions
- DistanceType - Distance metric used alongside IVF indexing