Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance Scanner Full Text Search

From Leeroopedia


Knowledge Sources
Domains Information_Retrieval, Full_Text_Search
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for executing full-text search queries against a Lance dataset's inverted index, provided by the Scanner type in the lance crate.

Description

The full_text_search method on Scanner accepts a FullTextSearchQuery and configures the scan pipeline to execute a full-text search against the inverted index. The query is compiled into a DataFusion execution plan that reads the inverted index, applies BM25 scoring, and returns results as a stream of record batches with _rowid and _score columns. When combined with other Scanner operations like project, the Scanner automatically joins the FTS results with the underlying data to retrieve the requested columns.

Usage

Call full_text_search on a Scanner obtained from Dataset::scan(). An inverted index must exist on the target column prior to calling this method.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/dataset/scanner.rs
  • Lines: 989-1004

Signature

impl Scanner {
    pub fn full_text_search(&mut self, query: FullTextSearchQuery) -> Result<&mut Self>
}

Import

use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, MatchQuery, PhraseQuery, BoostQuery, BooleanQuery, MultiMatchQuery};

I/O Contract

Inputs

Name Type Required Description
query FullTextSearchQuery Yes The full-text search query containing the FtsQuery, optional limit, and optional wand_factor

The FullTextSearchQuery struct contains:

Field Type Required Description
query FtsQuery Yes The query expression (Match, Phrase, Boost, MultiMatch, or Boolean)
limit Option<i64> No Maximum number of results to return; None for unlimited
wand_factor Option<f32> No WAND pruning aggressiveness; default 1.0, higher values trade recall for speed

The FtsQuery enum variants:

Variant Struct Key Fields
Match(MatchQuery) MatchQuery column: Option<String>, terms: String, boost: f32 (default 1.0), fuzziness: Option<u32>, max_expansions: usize (default 50), operator: Operator (default Or), prefix_length: u32
Phrase(PhraseQuery) PhraseQuery column: Option<String>, terms: String, slop: u32 (default 0)
Boost(BoostQuery) BoostQuery positive: Box<FtsQuery>, negative: Box<FtsQuery>, negative_boost: f32 (default 0.5)
MultiMatch(MultiMatchQuery) MultiMatchQuery match_queries: Vec<MatchQuery> (each targeting a different column)
Boolean(BooleanQuery) BooleanQuery must: Vec<FtsQuery>, should: Vec<FtsQuery>, must_not: Vec<FtsQuery>

Outputs

Name Type Description
_rowid UInt64 The row identifier of each matching document
_score Float32 The BM25 relevance score for each matching document

Additional columns from the dataset are included if project was called on the Scanner.

Usage Examples

Simple Term Search

use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;

async fn search(dataset: &Dataset) -> lance_core::Result<()> {
    let query = FullTextSearchQuery::new("machine learning".to_owned())
        .limit(Some(10));

    let results = dataset
        .scan()
        .full_text_search(query)?
        .try_into_batch()
        .await?;
    // results contains _rowid and _score columns
    Ok(())
}

Fuzzy Search

use lance_index::scalar::FullTextSearchQuery;

// Fuzzy match with max edit distance of 2
let query = FullTextSearchQuery::new_fuzzy("machin lerning".to_owned(), Some(2))
    .limit(Some(20));

Phrase Search with Slop

use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, PhraseQuery};

let phrase = PhraseQuery::new("neural network".to_owned()).with_slop(1);
let query = FullTextSearchQuery::new_query(FtsQuery::Phrase(phrase))
    .limit(Some(10));

Boolean Query

use lance_index::scalar::FullTextSearchQuery;
use lance_index::scalar::inverted::query::{FtsQuery, MatchQuery, BooleanQuery, Occur};

let bool_query = BooleanQuery::new(vec![
    (Occur::Must, MatchQuery::new("machine learning".to_owned()).into()),
    (Occur::Should, MatchQuery::new("deep learning".to_owned()).into()),
    (Occur::MustNot, MatchQuery::new("deprecated".to_owned()).into()),
]);
let query = FullTextSearchQuery::new_query(FtsQuery::Boolean(bool_query))
    .limit(Some(50));

Search with Projection (Retrieving Document Content)

use lance::Dataset;
use lance_index::scalar::FullTextSearchQuery;

async fn search_with_content(dataset: &Dataset) -> lance_core::Result<()> {
    let query = FullTextSearchQuery::new("search terms".to_owned())
        .limit(Some(10));

    let results = dataset
        .scan()
        .project(&["doc", "title"])?
        .full_text_search(query)?
        .try_into_batch()
        .await?;
    // results contains doc, title, _rowid, and _score columns
    Ok(())
}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment