Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance Create Inverted Index

From Leeroopedia


Knowledge Sources
Domains Information_Retrieval, Full_Text_Search
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for creating a full-text search inverted index on a Lance dataset column, provided by the lance crate through the DatasetIndexExt trait.

Description

The create_index method on Dataset (provided by the DatasetIndexExt trait) is the primary entry point for building an inverted index. It accepts the column name, index type, optional name, tokenizer parameters, and a replace flag. Internally, it streams all data from the specified column through the tokenization pipeline configured in InvertedIndexParams, builds posting lists, and writes the resulting index to the dataset's object store.

The method validates that exactly one column is specified and that the column has a supported text data type before beginning the build process.

Usage

Call create_index after writing text data to a Lance dataset. The index is persisted alongside the dataset and automatically used by subsequent full-text search queries.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/index.rs
  • Lines: 590-606 (entry point via DatasetIndexExt trait)
  • Related files:
    • rust/lance-index/src/scalar/inverted.rs lines 46-71 (train_inverted_index)
    • rust/lance-index/src/scalar/inverted/builder.rs lines 73-98 (InvertedIndexBuilder)

Signature

// From the DatasetIndexExt trait implementation on Dataset
async fn create_index(
    &mut self,
    columns: &[&str],
    index_type: IndexType,
    name: Option<String>,
    params: &dyn IndexParams,
    replace: bool,
) -> Result<IndexMetadata>

Import

use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;

I/O Contract

Inputs

Name Type Required Description
columns &[&str] Yes Column names to index; must contain exactly one text column with a supported data type
index_type IndexType Yes Must be IndexType::Inverted for full-text search
name Option<String> No Optional human-readable name for the index; auto-generated if None
params &dyn IndexParams Yes Must be an &InvertedIndexParams instance controlling tokenization behavior
replace bool Yes If true, replace any existing index on the same column; if false, fail if an index already exists

Outputs

Name Type Description
result Result<IndexMetadata> Metadata about the created index on success, or an error if creation fails

Environment Variables

Variable Default Description
LANCE_FTS_NUM_SHARDS Number of compute CPUs Number of parallel sharding workers for index building
LANCE_FTS_PARTITION_SIZE 256 (MiB) Maximum uncompressed partition size before flushing to disk
LANCE_FTS_TARGET_SIZE 4096 (MiB) Target merged partition size after the merge stage

Usage Examples

Basic Index Creation

use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;

async fn build_fts_index(dataset: &mut Dataset) -> lance_core::Result<()> {
    let params = InvertedIndexParams::default();
    dataset
        .create_index(
            &["doc"],
            IndexType::Inverted,
            None,
            &params,
            true, // replace existing index
        )
        .await?;
    Ok(())
}

Index with Named Index and Custom Tokenizer

use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;

async fn build_custom_index(dataset: &mut Dataset) -> lance_core::Result<()> {
    let params = InvertedIndexParams::default()
        .base_tokenizer("whitespace".to_string())
        .stem(false)
        .with_position(true);

    dataset
        .create_index(
            &["content"],
            IndexType::Inverted,
            Some("content_fts_idx".to_string()),
            &params,
            false, // do not replace
        )
        .await?;
    Ok(())
}

Complete Example from the Repository

// From rust/examples/src/full_text_search.rs
use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;

// After writing data to the dataset...
let params = InvertedIndexParams::default();
dataset
    .create_index(
        &["doc"],
        IndexType::Inverted,
        None,
        &params,
        true,
    )
    .await
    .unwrap();

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment