Heuristic:Lance format Lance BM25 FTS Configuration
| Knowledge Sources | |
|---|---|
| Domains | Full_Text_Search, Optimization |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
BM25 scoring defaults (K1=1.2, B=0.75) and FTS partition sizing (256 MiB partitions, 4 GiB merge target, 128-element block size) for Lance full-text search.
Description
Lance implements full-text search using an inverted index with BM25 scoring (Okapi BM25). The BM25 parameters K1 and B control term saturation and field length normalization respectively. The FTS indexing system uses sharded partitions with configurable sizes for balancing build speed, query performance, and memory usage. Block size is fixed at 128 for index compatibility.
Usage
Apply this heuristic when configuring full-text search indices, tuning FTS query relevance, or scaling FTS for large datasets. The BM25 parameters are standard defaults suitable for most text corpora. Partition sizing can be tuned via environment variables for production workloads: increase partition size for better query performance (fewer partitions to search), decrease for lower memory usage during indexing.
The Insight (Rule of Thumb)
BM25 Scoring
- K1: 1.2 — Controls how much additional term occurrences contribute to relevance. Standard BM25 default.
- B: 0.75 — Controls field length normalization. 0 = no normalization, 1 = full normalization.
- Trade-off: Higher K1 makes term frequency more important. Higher B penalizes longer documents more. These are the standard Okapi BM25 defaults used across most search engines.
FTS Partition Sizing
- LANCE_FTS_NUM_SHARDS: Defaults to number of compute-intensive CPUs. Higher = faster indexing, more memory.
- LANCE_FTS_PARTITION_SIZE: 256 MiB (uncompressed). Higher = better query performance, more memory.
- LANCE_FTS_TARGET_SIZE: 4,096 MiB (uncompressed) after merging. Controls merge threshold.
- BLOCK_SIZE: 128 (fixed, from BitPacker4x::BLOCK_LEN). WARNING: Changing this breaks index compatibility.
- DEFAULT_MAX_EXPANSIONS: 50 max term expansions for WAND queries.
- DEFAULT_WAND_FACTOR: 1.0 threshold factor.
Reasoning
K1=1.2 and B=0.75 are the standard BM25 defaults established by Robertson et al. and used by Lucene, Elasticsearch, and most modern search engines. They provide good relevance for general-purpose text search without domain-specific tuning.
The 256 MiB partition size balances memory usage during indexing with query performance. Larger partitions mean fewer seek operations during queries but require more RAM. The 4 GiB merge target prevents excessive partition fragmentation. The 128-element block size is dictated by the BitPacker4x SIMD implementation and cannot be changed without breaking all existing FTS indices.
The environment variable approach allows production tuning without recompilation, reflecting Lance's design philosophy of sensible defaults with runtime configurability.
Code Evidence
BM25 constants from `rust/lance-index/src/scalar/inverted/scorer.rs:24-25`:
pub const K1: f32 = 1.2;
pub const B: f32 = 0.75;
FTS max expansions from scanner defaults:
const DEFAULT_MAX_EXPANSIONS: usize = 50;
const DEFAULT_WAND_FACTOR: f32 = 1.0;
Block size warning from inverted index builder:
// BLOCK_SIZE = 128 (BitPacker4x::BLOCK_LEN)
// WARNING: Changing breaks index compatibility
// Each block contains 128 row IDs and 128 frequencies