Implementation:Lance format Lance CreateIndexBuilder
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, Indexing |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for training, building, and committing a vector index on a Lance dataset column, provided by the Lance library's index creation API.
Description
CreateIndexBuilder is a builder-pattern struct that orchestrates the full lifecycle of index creation: validation, training, encoding, writing, and committing. It is the primary API for building vector indices and is also used for scalar indices (BTree, Bitmap, Inverted, etc.).
The builder is typically obtained via Dataset::create_index_builder() or constructed directly with CreateIndexBuilder::new(). It implements IntoFuture, so it can be .awaited directly after configuration.
The legacy convenience method Dataset::create_index() delegates to this builder internally.
Usage
Use this builder when you need to:
- Build a vector index on a dataset's embedding column.
- Control the index name, replacement behavior, or training toggle.
- Perform distributed indexing by specifying fragment subsets and custom UUIDs.
- Create an empty index skeleton for later population.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/index/create.rs - Lines: L46-L57 (struct definition), L59-L111 (builder methods), L114-L434 (execute_uncommitted), L437-L464 (execute/commit)
- Also:
rust/lance/src/index.rsL580-L606 (trait methods on Dataset)
Signature
pub struct CreateIndexBuilder<'a> {
dataset: &'a mut Dataset,
columns: Vec<String>,
index_type: IndexType,
params: &'a dyn IndexParams,
name: Option<String>,
replace: bool,
train: bool,
fragments: Option<Vec<u32>>,
index_uuid: Option<String>,
preprocessed_data: Option<Box<dyn RecordBatchReader + Send + 'static>>,
}
impl<'a> CreateIndexBuilder<'a> {
pub fn new(
dataset: &'a mut Dataset,
columns: &[&str],
index_type: IndexType,
params: &'a dyn IndexParams,
) -> Self;
pub fn name(self, name: String) -> Self;
pub fn replace(self, replace: bool) -> Self;
pub fn train(self, train: bool) -> Self;
pub fn fragments(self, fragment_ids: Vec<u32>) -> Self;
pub fn index_uuid(self, uuid: String) -> Self;
pub fn preprocessed_data(self, stream: Box<dyn RecordBatchReader + Send + 'static>) -> Self;
}
// Awaiting the builder executes the build and commit:
impl<'a> IntoFuture for CreateIndexBuilder<'a> {
type Output = Result<IndexMetadata>;
}
Import
use lance::index::create::CreateIndexBuilder;
use lance::dataset::Dataset;
use lance_index::{IndexType, IndexParams, DatasetIndexExt};
use lance::index::vector::VectorIndexParams;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dataset | &'a mut Dataset |
Yes | Mutable reference to the dataset on which to build the index. |
| columns | &[&str] |
Yes | Column names to index. Currently exactly 1 column is supported for vector indices. |
| index_type | IndexType |
Yes | The type of index to build: Vector, IvfPq, IvfHnswPq, IvfHnswSq, IvfFlat, etc.
|
| params | &'a dyn IndexParams |
Yes | Index parameters; for vector indices, pass a &VectorIndexParams.
|
| name | Option<String> |
No | Custom name for the index. If not specified, defaults to {column}_idx with collision avoidance.
|
| replace | bool |
No | If true, replaces an existing index with the same name and field. Default: false.
|
| train | bool |
No | If false, creates an empty index without training. Default: true. Automatically set to false if the dataset is empty.
|
| fragments | Option<Vec<u32>> |
No | Specific fragment IDs to index. Used for distributed indexing where each worker handles a subset. |
| index_uuid | Option<String> |
No | Custom UUID for the index. Used in distributed indexing for deterministic naming. |
| preprocessed_data | Option<Box<dyn RecordBatchReader + Send>> |
No | Pre-computed data stream (currently only for BTree scalar indices). |
Outputs
| Name | Type | Description |
|---|---|---|
| result | Result<IndexMetadata> |
On success, returns metadata for the committed index containing: uuid (unique identifier), name (index name), fields (indexed field IDs), dataset_version (version at build time), fragment_bitmap (which fragments are covered), created_at (timestamp).
|
Usage Examples
Building an IVF_PQ index via the builder
use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_linalg::distance::MetricType;
async fn build_index(dataset: &mut Dataset) -> lance::Result<()> {
let params = VectorIndexParams::ivf_pq(
256, // num_partitions
8, // num_bits
16, // num_sub_vectors
MetricType::L2, // metric
50, // max_iterations
);
let index_meta = dataset
.create_index_builder(&["vector"], IndexType::IvfPq, ¶ms)
.name("my_vector_idx".to_string())
.replace(true)
.await?;
println!("Created index: {} (uuid={})", index_meta.name, index_meta.uuid);
Ok(())
}
Building an IVF_HNSW_SQ index via the legacy API
use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_index::vector::ivf::IvfBuildParams;
use lance_index::vector::hnsw::builder::HnswBuildParams;
use lance_index::vector::sq::SQBuildParams;
use lance_linalg::distance::MetricType;
async fn build_hnsw_sq(dataset: &mut Dataset) -> lance::Result<()> {
let params = VectorIndexParams::with_ivf_hnsw_sq_params(
MetricType::Cosine,
IvfBuildParams::new(512),
HnswBuildParams::default(),
SQBuildParams::default(),
);
dataset
.create_index(
&["vector"],
IndexType::IvfHnswSq,
Some("hnsw_sq_idx".to_string()),
¶ms,
true, // replace
)
.await?;
Ok(())
}
Distributed indexing on specific fragments
use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_linalg::distance::MetricType;
async fn distributed_build(dataset: &mut Dataset, fragment_ids: Vec<u32>) -> lance::Result<()> {
let params = VectorIndexParams::ivf_pq(256, 8, 16, MetricType::L2, 50);
// Each worker builds on its assigned fragments
let index_meta = dataset
.create_index_builder(&["vector"], IndexType::IvfPq, ¶ms)
.fragments(fragment_ids)
.index_uuid("550e8400-e29b-41d4-a716-446655440000".to_string())
.await?;
println!("Indexed fragments: {:?}", index_meta.fragment_bitmap);
Ok(())
}
Related Pages
Implements Principle
Requires Environment
- Environment:Lance_format_Lance_Rust_Toolchain
- Environment:Lance_format_Lance_SIMD_And_Platform_Requirements