Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance CreateIndexBuilder

From Leeroopedia


Knowledge Sources
Domains Vector_Search, Indexing
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for training, building, and committing a vector index on a Lance dataset column, provided by the Lance library's index creation API.

Description

CreateIndexBuilder is a builder-pattern struct that orchestrates the full lifecycle of index creation: validation, training, encoding, writing, and committing. It is the primary API for building vector indices and is also used for scalar indices (BTree, Bitmap, Inverted, etc.).

The builder is typically obtained via Dataset::create_index_builder() or constructed directly with CreateIndexBuilder::new(). It implements IntoFuture, so it can be .awaited directly after configuration.

The legacy convenience method Dataset::create_index() delegates to this builder internally.

Usage

Use this builder when you need to:

  • Build a vector index on a dataset's embedding column.
  • Control the index name, replacement behavior, or training toggle.
  • Perform distributed indexing by specifying fragment subsets and custom UUIDs.
  • Create an empty index skeleton for later population.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/index/create.rs
  • Lines: L46-L57 (struct definition), L59-L111 (builder methods), L114-L434 (execute_uncommitted), L437-L464 (execute/commit)
  • Also: rust/lance/src/index.rs L580-L606 (trait methods on Dataset)

Signature

pub struct CreateIndexBuilder<'a> {
    dataset: &'a mut Dataset,
    columns: Vec<String>,
    index_type: IndexType,
    params: &'a dyn IndexParams,
    name: Option<String>,
    replace: bool,
    train: bool,
    fragments: Option<Vec<u32>>,
    index_uuid: Option<String>,
    preprocessed_data: Option<Box<dyn RecordBatchReader + Send + 'static>>,
}

impl<'a> CreateIndexBuilder<'a> {
    pub fn new(
        dataset: &'a mut Dataset,
        columns: &[&str],
        index_type: IndexType,
        params: &'a dyn IndexParams,
    ) -> Self;

    pub fn name(self, name: String) -> Self;
    pub fn replace(self, replace: bool) -> Self;
    pub fn train(self, train: bool) -> Self;
    pub fn fragments(self, fragment_ids: Vec<u32>) -> Self;
    pub fn index_uuid(self, uuid: String) -> Self;
    pub fn preprocessed_data(self, stream: Box<dyn RecordBatchReader + Send + 'static>) -> Self;
}

// Awaiting the builder executes the build and commit:
impl<'a> IntoFuture for CreateIndexBuilder<'a> {
    type Output = Result<IndexMetadata>;
}

Import

use lance::index::create::CreateIndexBuilder;
use lance::dataset::Dataset;
use lance_index::{IndexType, IndexParams, DatasetIndexExt};
use lance::index::vector::VectorIndexParams;

I/O Contract

Inputs

Name Type Required Description
dataset &'a mut Dataset Yes Mutable reference to the dataset on which to build the index.
columns &[&str] Yes Column names to index. Currently exactly 1 column is supported for vector indices.
index_type IndexType Yes The type of index to build: Vector, IvfPq, IvfHnswPq, IvfHnswSq, IvfFlat, etc.
params &'a dyn IndexParams Yes Index parameters; for vector indices, pass a &VectorIndexParams.
name Option<String> No Custom name for the index. If not specified, defaults to {column}_idx with collision avoidance.
replace bool No If true, replaces an existing index with the same name and field. Default: false.
train bool No If false, creates an empty index without training. Default: true. Automatically set to false if the dataset is empty.
fragments Option<Vec<u32>> No Specific fragment IDs to index. Used for distributed indexing where each worker handles a subset.
index_uuid Option<String> No Custom UUID for the index. Used in distributed indexing for deterministic naming.
preprocessed_data Option<Box<dyn RecordBatchReader + Send>> No Pre-computed data stream (currently only for BTree scalar indices).

Outputs

Name Type Description
result Result<IndexMetadata> On success, returns metadata for the committed index containing: uuid (unique identifier), name (index name), fields (indexed field IDs), dataset_version (version at build time), fragment_bitmap (which fragments are covered), created_at (timestamp).

Usage Examples

Building an IVF_PQ index via the builder

use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_linalg::distance::MetricType;

async fn build_index(dataset: &mut Dataset) -> lance::Result<()> {
    let params = VectorIndexParams::ivf_pq(
        256,             // num_partitions
        8,               // num_bits
        16,              // num_sub_vectors
        MetricType::L2,  // metric
        50,              // max_iterations
    );

    let index_meta = dataset
        .create_index_builder(&["vector"], IndexType::IvfPq, &params)
        .name("my_vector_idx".to_string())
        .replace(true)
        .await?;

    println!("Created index: {} (uuid={})", index_meta.name, index_meta.uuid);
    Ok(())
}

Building an IVF_HNSW_SQ index via the legacy API

use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_index::vector::ivf::IvfBuildParams;
use lance_index::vector::hnsw::builder::HnswBuildParams;
use lance_index::vector::sq::SQBuildParams;
use lance_linalg::distance::MetricType;

async fn build_hnsw_sq(dataset: &mut Dataset) -> lance::Result<()> {
    let params = VectorIndexParams::with_ivf_hnsw_sq_params(
        MetricType::Cosine,
        IvfBuildParams::new(512),
        HnswBuildParams::default(),
        SQBuildParams::default(),
    );

    dataset
        .create_index(
            &["vector"],
            IndexType::IvfHnswSq,
            Some("hnsw_sq_idx".to_string()),
            &params,
            true,  // replace
        )
        .await?;
    Ok(())
}

Distributed indexing on specific fragments

use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use lance_index::{DatasetIndexExt, IndexType};
use lance_linalg::distance::MetricType;

async fn distributed_build(dataset: &mut Dataset, fragment_ids: Vec<u32>) -> lance::Result<()> {
    let params = VectorIndexParams::ivf_pq(256, 8, 16, MetricType::L2, 50);

    // Each worker builds on its assigned fragments
    let index_meta = dataset
        .create_index_builder(&["vector"], IndexType::IvfPq, &params)
        .fragments(fragment_ids)
        .index_uuid("550e8400-e29b-41d4-a716-446655440000".to_string())
        .await?;

    println!("Indexed fragments: {:?}", index_meta.fragment_bitmap);
    Ok(())
}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment