Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Optimize Indices

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Storage_Optimization
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for merging delta index segments into larger, consolidated indices to improve query performance, provided by the Lance library.

Description

Dataset::optimize_indices is implemented via the DatasetIndexExt trait. It loads all current index metadata, groups indices by name, and for each group calls the internal merge_indices function. The merge function opens each delta index segment, determines the unindexed fragments, and either creates a new delta covering the unindexed data or merges multiple existing deltas into a single index. The result is committed as an Operation::CreateIndex transaction that records both the newly created indices and the removed (merged) ones.

System indices (such as the fragment reuse index) are automatically excluded from optimization.

Usage

Call optimize_indices after data appends or compaction operations to consolidate delta index segments and maintain query performance. It can be targeted at specific index names or applied to all indices.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/index.rs (L845-L912), rust/lance/src/index/append.rs (L44-L58 for merge_indices)
  • Lines: See above

Signature

// Via DatasetIndexExt trait
impl Dataset {
    pub async fn optimize_indices(
        &mut self,
        options: &OptimizeOptions,
    ) -> Result<()>
}

Import

use lance_index::optimize::OptimizeOptions;
use lance_index::DatasetIndexExt;

I/O Contract

Inputs

Name Type Required Description
self &mut Dataset Yes Mutable reference to the dataset whose indices will be optimized.
options &OptimizeOptions Yes Configuration controlling the merge behavior.

OptimizeOptions fields:

Field Type Default Description
num_indices_to_merge Option<usize> None Number of delta indices to merge per column. If None, Lance decides automatically. If Some(N), the latest N deltas plus unindexed data are merged.
index_names Option<Vec<String>> None Specific index names to optimize. If None, all indices are optimized.
retrain bool false If true, retrain the index from scratch on current data instead of merging deltas. Ignores num_indices_to_merge.

Outputs

Name Type Description
Result<()> Result Returns Ok(()) on success. The method commits an Operation::CreateIndex transaction internally. If no optimization is needed (no delta segments to merge), returns Ok(()) without creating a new version.

Usage Examples

use lance::Dataset;
use lance_index::optimize::OptimizeOptions;
use lance_index::DatasetIndexExt;

async fn optimize_all_indices(dataset: &mut Dataset) -> lance::Result<()> {
    // Merge all delta indices automatically
    let options = OptimizeOptions::default();
    dataset.optimize_indices(&options).await?;
    Ok(())
}

async fn optimize_specific_index(dataset: &mut Dataset) -> lance::Result<()> {
    // Merge only the "my_vector_idx" index, combining up to 3 deltas
    let options = OptimizeOptions {
        num_indices_to_merge: Some(3),
        index_names: Some(vec!["my_vector_idx".to_string()]),
        ..Default::default()
    };
    dataset.optimize_indices(&options).await?;
    Ok(())
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment