Implementation:Lance format Lance Create Inverted Index
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, Full_Text_Search |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for creating a full-text search inverted index on a Lance dataset column, provided by the lance crate through the DatasetIndexExt trait.
Description
The create_index method on Dataset (provided by the DatasetIndexExt trait) is the primary entry point for building an inverted index. It accepts the column name, index type, optional name, tokenizer parameters, and a replace flag. Internally, it streams all data from the specified column through the tokenization pipeline configured in InvertedIndexParams, builds posting lists, and writes the resulting index to the dataset's object store.
The method validates that exactly one column is specified and that the column has a supported text data type before beginning the build process.
Usage
Call create_index after writing text data to a Lance dataset. The index is persisted alongside the dataset and automatically used by subsequent full-text search queries.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/index.rs - Lines: 590-606 (entry point via
DatasetIndexExttrait) - Related files:
rust/lance-index/src/scalar/inverted.rslines 46-71 (train_inverted_index)rust/lance-index/src/scalar/inverted/builder.rslines 73-98 (InvertedIndexBuilder)
Signature
// From the DatasetIndexExt trait implementation on Dataset
async fn create_index(
&mut self,
columns: &[&str],
index_type: IndexType,
name: Option<String>,
params: &dyn IndexParams,
replace: bool,
) -> Result<IndexMetadata>
Import
use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| columns | &[&str] |
Yes | Column names to index; must contain exactly one text column with a supported data type |
| index_type | IndexType |
Yes | Must be IndexType::Inverted for full-text search
|
| name | Option<String> |
No | Optional human-readable name for the index; auto-generated if None
|
| params | &dyn IndexParams |
Yes | Must be an &InvertedIndexParams instance controlling tokenization behavior
|
| replace | bool |
Yes | If true, replace any existing index on the same column; if false, fail if an index already exists
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | Result<IndexMetadata> |
Metadata about the created index on success, or an error if creation fails |
Environment Variables
| Variable | Default | Description |
|---|---|---|
LANCE_FTS_NUM_SHARDS |
Number of compute CPUs | Number of parallel sharding workers for index building |
LANCE_FTS_PARTITION_SIZE |
256 (MiB) |
Maximum uncompressed partition size before flushing to disk |
LANCE_FTS_TARGET_SIZE |
4096 (MiB) |
Target merged partition size after the merge stage |
Usage Examples
Basic Index Creation
use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;
async fn build_fts_index(dataset: &mut Dataset) -> lance_core::Result<()> {
let params = InvertedIndexParams::default();
dataset
.create_index(
&["doc"],
IndexType::Inverted,
None,
¶ms,
true, // replace existing index
)
.await?;
Ok(())
}
Index with Named Index and Custom Tokenizer
use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;
async fn build_custom_index(dataset: &mut Dataset) -> lance_core::Result<()> {
let params = InvertedIndexParams::default()
.base_tokenizer("whitespace".to_string())
.stem(false)
.with_position(true);
dataset
.create_index(
&["content"],
IndexType::Inverted,
Some("content_fts_idx".to_string()),
¶ms,
false, // do not replace
)
.await?;
Ok(())
}
Complete Example from the Repository
// From rust/examples/src/full_text_search.rs
use lance::Dataset;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_index::scalar::InvertedIndexParams;
// After writing data to the dataset...
let params = InvertedIndexParams::default();
dataset
.create_index(
&["doc"],
IndexType::Inverted,
None,
¶ms,
true,
)
.await
.unwrap();