Implementation:Lance format Lance Scanner Full Text Search
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, Full_Text_Search |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for executing full-text search queries against a Lance dataset's inverted index, provided by the Scanner type in the lance crate.
Description
The full_text_search method on Scanner accepts a FullTextSearchQuery and configures the scan pipeline to execute a full-text search against the inverted index. The query is compiled into a DataFusion execution plan that reads the inverted index, applies BM25 scoring, and returns results as a stream of record batches with _rowid and _score columns. When combined with other Scanner operations like project, the Scanner automatically joins the FTS results with the underlying data to retrieve the requested columns.
Usage
Call full_text_search on a Scanner obtained from Dataset::scan(). An inverted index must exist on the target column prior to calling this method.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/dataset/scanner.rs - Lines: 989-1004
Signature
impl Scanner {
pub fn full_text_search(&mut self, query: FullTextSearchQuery) -> Result<&mut Self>
}
Import
use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, MatchQuery, PhraseQuery, BoostQuery, BooleanQuery, MultiMatchQuery};
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | FullTextSearchQuery |
Yes | The full-text search query containing the FtsQuery, optional limit, and optional wand_factor |
The FullTextSearchQuery struct contains:
| Field | Type | Required | Description |
|---|---|---|---|
| query | FtsQuery |
Yes | The query expression (Match, Phrase, Boost, MultiMatch, or Boolean) |
| limit | Option<i64> |
No | Maximum number of results to return; None for unlimited
|
| wand_factor | Option<f32> |
No | WAND pruning aggressiveness; default 1.0, higher values trade recall for speed |
The FtsQuery enum variants:
| Variant | Struct | Key Fields |
|---|---|---|
Match(MatchQuery) |
MatchQuery |
column: Option<String>, terms: String, boost: f32 (default 1.0), fuzziness: Option<u32>, max_expansions: usize (default 50), operator: Operator (default Or), prefix_length: u32
|
Phrase(PhraseQuery) |
PhraseQuery |
column: Option<String>, terms: String, slop: u32 (default 0)
|
Boost(BoostQuery) |
BoostQuery |
positive: Box<FtsQuery>, negative: Box<FtsQuery>, negative_boost: f32 (default 0.5)
|
MultiMatch(MultiMatchQuery) |
MultiMatchQuery |
match_queries: Vec<MatchQuery> (each targeting a different column)
|
Boolean(BooleanQuery) |
BooleanQuery |
must: Vec<FtsQuery>, should: Vec<FtsQuery>, must_not: Vec<FtsQuery>
|
Outputs
| Name | Type | Description |
|---|---|---|
| _rowid | UInt64 |
The row identifier of each matching document |
| _score | Float32 |
The BM25 relevance score for each matching document |
Additional columns from the dataset are included if project was called on the Scanner.
Usage Examples
Simple Term Search
use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;
async fn search(dataset: &Dataset) -> lance_core::Result<()> {
let query = FullTextSearchQuery::new("machine learning".to_owned())
.limit(Some(10));
let results = dataset
.scan()
.full_text_search(query)?
.try_into_batch()
.await?;
// results contains _rowid and _score columns
Ok(())
}
Fuzzy Search
use lance_index::scalar::FullTextSearchQuery;
// Fuzzy match with max edit distance of 2
let query = FullTextSearchQuery::new_fuzzy("machin lerning".to_owned(), Some(2))
.limit(Some(20));
Phrase Search with Slop
use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, PhraseQuery};
let phrase = PhraseQuery::new("neural network".to_owned()).with_slop(1);
let query = FullTextSearchQuery::new_query(FtsQuery::Phrase(phrase))
.limit(Some(10));
Boolean Query
use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, MatchQuery, BooleanQuery, Occur};
let bool_query = BooleanQuery::new(vec![
(Occur::Must, MatchQuery::new("machine learning".to_owned()).into()),
(Occur::Should, MatchQuery::new("deep learning".to_owned()).into()),
(Occur::MustNot, MatchQuery::new("deprecated".to_owned()).into()),
]);
let query = FullTextSearchQuery::new_query(FtsQuery::Boolean(bool_query))
.limit(Some(50));
Search with Projection (Retrieving Document Content)
use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;
async fn search_with_content(dataset: &Dataset) -> lance_core::Result<()> {
let query = FullTextSearchQuery::new("search terms".to_owned())
.limit(Some(10));
let results = dataset
.scan()
.project(&["doc", "title"])?
.full_text_search(query)?
.try_into_batch()
.await?;
// results contains doc, title, _rowid, and _score columns
Ok(())
}