Implementation:Lance format Lance Java InvertedIndexParams

Knowledge Sources	Lance
Domains	Java_SDK, Indexing
Last Updated	2026-02-08 19:33 GMT

Overview

Description

InvertedIndexParams is a final Java class in the org.lance.index.scalar package that provides a builder-style configuration for inverted (full-text) scalar index parameters. It constructs a ScalarIndexParams instance with the index type set to "inverted". The builder exposes a comprehensive set of text analysis options including tokenizer selection, language configuration, stemming, stop word removal, ASCII folding, N-gram settings, and position storage. All parameters are optional and serialized to JSON internally.

Usage

Use InvertedIndexParams.builder() to configure inverted index options. The build() method returns a ScalarIndexParams object. This is the primary way to configure full-text search indices in the Lance Java SDK, supporting multiple tokenizer types (simple, whitespace, raw, ngram, lindera, jieba) and language-aware text processing.

Code Reference

Source Location

java/src/main/java/org/lance/index/scalar/InvertedIndexParams.java

Signature

public final class InvertedIndexParams {
    public static Builder builder();

    public static final class Builder {
        public Builder baseTokenizer(String baseTokenizer);
        public Builder language(String language);
        public Builder withPosition(boolean withPosition);
        public Builder maxTokenLength(Integer maxTokenLength);
        public Builder lowerCase(boolean lowerCase);
        public Builder stem(boolean stem);
        public Builder removeStopWords(boolean removeStopWords);
        public Builder customStopWords(List<String> customStopWords);
        public Builder asciiFolding(boolean asciiFolding);
        public Builder minNgramLength(int minNgramLength);
        public Builder maxNgramLength(int maxNgramLength);
        public Builder prefixOnly(boolean prefixOnly);
        public Builder skipMerge(boolean skipMerge);
        public ScalarIndexParams build();
    }
}

Import

import org.lance.index.scalar.InvertedIndexParams;

I/O Contract

Builder Inputs
Parameter	Type	Required	Default	Validation	Description
`baseTokenizer`	`String`	No	Not set (Rust default: "simple")	Non-null, non-empty	Tokenizer identifier: "simple", "whitespace", "raw", "ngram", "lindera/", "jieba/"
`language`	`String`	No	Not set	Non-null, non-empty	Language for stemming and stop words (e.g., "English")
`withPosition`	`boolean`	No	Not set	None	Whether to store token positions in the index
`maxTokenLength`	`Integer`	No	Not set	Must be positive	Maximum token length
`lowerCase`	`boolean`	No	Not set	None	Whether to lower case tokens
`stem`	`boolean`	No	Not set	None	Whether to apply stemming
`removeStopWords`	`boolean`	No	Not set	None	Whether to remove stop words
`customStopWords`	`List<String>`	No	Not set	Non-null	Custom stop word list overriding built-in language defaults
`asciiFolding`	`boolean`	No	Not set	None	Whether to apply ASCII folding (e.g., accented characters to ASCII)
`minNgramLength`	`int`	No	Not set	Must be positive	Minimum N-gram length (ngram tokenizer only)
`maxNgramLength`	`int`	No	Not set	Must be positive, >= minNgramLength	Maximum N-gram length (ngram tokenizer only)
`prefixOnly`	`boolean`	No	Not set	None	Generate only prefix N-grams (ngram tokenizer only)
`skipMerge`	`boolean`	No	Not set	None	Skip partition merge after indexing (for distributed builds)

Build Output
Return Type	Description
`ScalarIndexParams`	A scalar index params object with type `"inverted"` and JSON configuration

Usage Examples

import org.lance.index.scalar.InvertedIndexParams;
import org.lance.index.scalar.ScalarIndexParams;
import org.lance.index.IndexParams;
import java.util.Arrays;

// Create a basic inverted index with defaults
ScalarIndexParams basic = InvertedIndexParams.builder().build();

// Create an English full-text search index with stemming and stop word removal
ScalarIndexParams english = InvertedIndexParams.builder()
    .language("English")
    .stem(true)
    .removeStopWords(true)
    .lowerCase(true)
    .withPosition(true)
    .build();

// Create an N-gram index for substring matching
ScalarIndexParams ngram = InvertedIndexParams.builder()
    .baseTokenizer("ngram")
    .minNgramLength(2)
    .maxNgramLength(4)
    .prefixOnly(false)
    .build();

// Create an index with custom stop words
ScalarIndexParams custom = InvertedIndexParams.builder()
    .language("English")
    .removeStopWords(true)
    .customStopWords(Arrays.asList("the", "a", "an", "custom"))
    .asciiFolding(true)
    .build();

// Wrap in IndexParams
IndexParams params = IndexParams.builder()
    .setScalarIndexParams(english)
    .build();

Related Pages

ScalarIndexParams - The output type produced by InvertedIndexParams.Builder.build()
IndexParams - Top-level container that holds scalar index params
BTreeIndexParams - Another scalar index builder for B-Tree indices
IndexType - Contains the INVERTED and NGRAM enum constants

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment