Implementation:Lance format Lance Optimize Indices
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Storage_Optimization |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for merging delta index segments into larger, consolidated indices to improve query performance, provided by the Lance library.
Description
Dataset::optimize_indices is implemented via the DatasetIndexExt trait. It loads all current index metadata, groups indices by name, and for each group calls the internal merge_indices function. The merge function opens each delta index segment, determines the unindexed fragments, and either creates a new delta covering the unindexed data or merges multiple existing deltas into a single index. The result is committed as an Operation::CreateIndex transaction that records both the newly created indices and the removed (merged) ones.
System indices (such as the fragment reuse index) are automatically excluded from optimization.
Usage
Call optimize_indices after data appends or compaction operations to consolidate delta index segments and maintain query performance. It can be targeted at specific index names or applied to all indices.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/index.rs(L845-L912),rust/lance/src/index/append.rs(L44-L58 for merge_indices) - Lines: See above
Signature
// Via DatasetIndexExt trait
impl Dataset {
pub async fn optimize_indices(
&mut self,
options: &OptimizeOptions,
) -> Result<()>
}
Import
use lance_index::optimize::OptimizeOptions;
use lance_index::DatasetIndexExt;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self | &mut Dataset | Yes | Mutable reference to the dataset whose indices will be optimized. |
| options | &OptimizeOptions | Yes | Configuration controlling the merge behavior. |
OptimizeOptions fields:
| Field | Type | Default | Description |
|---|---|---|---|
| num_indices_to_merge | Option<usize> | None | Number of delta indices to merge per column. If None, Lance decides automatically. If Some(N), the latest N deltas plus unindexed data are merged. |
| index_names | Option<Vec<String>> | None | Specific index names to optimize. If None, all indices are optimized. |
| retrain | bool | false | If true, retrain the index from scratch on current data instead of merging deltas. Ignores num_indices_to_merge.
|
Outputs
| Name | Type | Description |
|---|---|---|
| Result<()> | Result | Returns Ok(()) on success. The method commits an Operation::CreateIndex transaction internally. If no optimization is needed (no delta segments to merge), returns Ok(()) without creating a new version.
|
Usage Examples
use lance::Dataset;
use lance_index::optimize::OptimizeOptions;
use lance_index::DatasetIndexExt;
async fn optimize_all_indices(dataset: &mut Dataset) -> lance::Result<()> {
// Merge all delta indices automatically
let options = OptimizeOptions::default();
dataset.optimize_indices(&options).await?;
Ok(())
}
async fn optimize_specific_index(dataset: &mut Dataset) -> lance::Result<()> {
// Merge only the "my_vector_idx" index, combining up to 3 deltas
let options = OptimizeOptions {
num_indices_to_merge: Some(3),
index_names: Some(vec!["my_vector_idx".to_string()]),
..Default::default()
};
dataset.optimize_indices(&options).await?;
Ok(())
}