Implementation:Lance format Lance Java InvertedIndexParams
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Indexing |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
InvertedIndexParams is a final Java class in the org.lance.index.scalar package that provides a builder-style configuration for inverted (full-text) scalar index parameters. It constructs a ScalarIndexParams instance with the index type set to "inverted". The builder exposes a comprehensive set of text analysis options including tokenizer selection, language configuration, stemming, stop word removal, ASCII folding, N-gram settings, and position storage. All parameters are optional and serialized to JSON internally.
Usage
Use InvertedIndexParams.builder() to configure inverted index options. The build() method returns a ScalarIndexParams object. This is the primary way to configure full-text search indices in the Lance Java SDK, supporting multiple tokenizer types (simple, whitespace, raw, ngram, lindera, jieba) and language-aware text processing.
Code Reference
Source Location
java/src/main/java/org/lance/index/scalar/InvertedIndexParams.java
Signature
public final class InvertedIndexParams {
public static Builder builder();
public static final class Builder {
public Builder baseTokenizer(String baseTokenizer);
public Builder language(String language);
public Builder withPosition(boolean withPosition);
public Builder maxTokenLength(Integer maxTokenLength);
public Builder lowerCase(boolean lowerCase);
public Builder stem(boolean stem);
public Builder removeStopWords(boolean removeStopWords);
public Builder customStopWords(List<String> customStopWords);
public Builder asciiFolding(boolean asciiFolding);
public Builder minNgramLength(int minNgramLength);
public Builder maxNgramLength(int maxNgramLength);
public Builder prefixOnly(boolean prefixOnly);
public Builder skipMerge(boolean skipMerge);
public ScalarIndexParams build();
}
}
Import
import org.lance.index.scalar.InvertedIndexParams;
I/O Contract
| Parameter | Type | Required | Default | Validation | Description |
|---|---|---|---|---|---|
baseTokenizer |
String |
No | Not set (Rust default: "simple") | Non-null, non-empty | Tokenizer identifier: "simple", "whitespace", "raw", "ngram", "lindera/*", "jieba/*" |
language |
String |
No | Not set | Non-null, non-empty | Language for stemming and stop words (e.g., "English") |
withPosition |
boolean |
No | Not set | None | Whether to store token positions in the index |
maxTokenLength |
Integer |
No | Not set | Must be positive | Maximum token length |
lowerCase |
boolean |
No | Not set | None | Whether to lower case tokens |
stem |
boolean |
No | Not set | None | Whether to apply stemming |
removeStopWords |
boolean |
No | Not set | None | Whether to remove stop words |
customStopWords |
List<String> |
No | Not set | Non-null | Custom stop word list overriding built-in language defaults |
asciiFolding |
boolean |
No | Not set | None | Whether to apply ASCII folding (e.g., accented characters to ASCII) |
minNgramLength |
int |
No | Not set | Must be positive | Minimum N-gram length (ngram tokenizer only) |
maxNgramLength |
int |
No | Not set | Must be positive, >= minNgramLength | Maximum N-gram length (ngram tokenizer only) |
prefixOnly |
boolean |
No | Not set | None | Generate only prefix N-grams (ngram tokenizer only) |
skipMerge |
boolean |
No | Not set | None | Skip partition merge after indexing (for distributed builds) |
| Return Type | Description |
|---|---|
ScalarIndexParams |
A scalar index params object with type "inverted" and JSON configuration
|
Usage Examples
import org.lance.index.scalar.InvertedIndexParams;
import org.lance.index.scalar.ScalarIndexParams;
import org.lance.index.IndexParams;
import java.util.Arrays;
// Create a basic inverted index with defaults
ScalarIndexParams basic = InvertedIndexParams.builder().build();
// Create an English full-text search index with stemming and stop word removal
ScalarIndexParams english = InvertedIndexParams.builder()
.language("English")
.stem(true)
.removeStopWords(true)
.lowerCase(true)
.withPosition(true)
.build();
// Create an N-gram index for substring matching
ScalarIndexParams ngram = InvertedIndexParams.builder()
.baseTokenizer("ngram")
.minNgramLength(2)
.maxNgramLength(4)
.prefixOnly(false)
.build();
// Create an index with custom stop words
ScalarIndexParams custom = InvertedIndexParams.builder()
.language("English")
.removeStopWords(true)
.customStopWords(Arrays.asList("the", "a", "an", "custom"))
.asciiFolding(true)
.build();
// Wrap in IndexParams
IndexParams params = IndexParams.builder()
.setScalarIndexParams(english)
.build();
Related Pages
- ScalarIndexParams - The output type produced by
InvertedIndexParams.Builder.build() - IndexParams - Top-level container that holds scalar index params
- BTreeIndexParams - Another scalar index builder for B-Tree indices
- IndexType - Contains the
INVERTEDandNGRAMenum constants