Implementation:Lance format Lance Scanner Filter Query
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, Full_Text_Search |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete pattern for composing full-text search with vector similarity search in a single Scanner query pipeline, provided by the filter_query method and the QueryFilter enum.
Description
This pattern describes how users combine two search modalities (FTS and vector) on a Lance Scanner. The filter_query method accepts a QueryFilter enum that wraps either a FullTextSearchQuery or a Query (vector query). When one search modality is set as the primary (via full_text_search or nearest), the other can be used as a filter via filter_query. The Scanner internally selects the optimal execution strategy based on which is primary and the prefilter setting.
Usage
Apply this pattern whenever you need results that satisfy both keyword and semantic relevance. Set one modality as the primary search and the other as the filter query.
Interface Specification
Source Location
- Repository: Lance
- File:
rust/lance/src/dataset/scanner.rs - Lines:
- 386-389 (
QueryFilterenum definition) - 971-973 (
filter_querymethod) - 2448-2481 (FTS primary + vector filter execution)
- 2483-2529 (vector primary + FTS filter execution)
- 3614-3651 (join-based flat FTS composition)
- 386-389 (
QueryFilter Enum
/// Query filter for filtering rows
#[derive(Debug, Clone)]
pub enum QueryFilter {
Fts(FullTextSearchQuery),
Vector(Query),
}
filter_query Method
impl Scanner {
pub fn filter_query(&mut self, filter: QueryFilter) -> Result<&mut Self> {
self.filter.query_filter = Some(filter);
Ok(self)
}
}
Composition Patterns
Pattern A: FTS Primary with Vector Filter
Set full-text search as the primary, vector as the filter. The Scanner runs vector search first to identify candidate rows, then applies FTS scoring on those candidates.
use lance::Dataset;
use lance::dataset::scanner::QueryFilter;
use lance_index::scalar::FullTextSearchQuery;
use lance::dataset::scanner::Scanner;
async fn fts_primary_vector_filter(dataset: &Dataset) -> lance_core::Result<()> {
let fts_query = FullTextSearchQuery::new("machine learning".to_owned())
.limit(Some(20));
// Vector query as a Query struct (already constructed)
// let vector_query = Query { column, key, k, ... };
let results = dataset
.scan()
.full_text_search(fts_query)?
// .filter_query(QueryFilter::Vector(vector_query))?
.try_into_batch()
.await?;
Ok(())
}
Pattern B: Vector Primary with FTS Filter
Set vector search as the primary, FTS as the filter. The Scanner runs FTS first to identify matching rows, fetches their vectors, and performs flat KNN.
use lance::Dataset;
use lance::dataset::scanner::QueryFilter;
use lance_index::scalar::FullTextSearchQuery;
async fn vector_primary_fts_filter(dataset: &Dataset) -> lance_core::Result<()> {
let fts_filter = FullTextSearchQuery::new("neural network".to_owned());
let results = dataset
.scan()
// .nearest("vector_col", &query_vector, k)?
.filter_query(QueryFilter::Fts(fts_filter))?
.try_into_batch()
.await?;
Ok(())
}
Pattern C: Join-Based Composition
When neither modality neatly filters the other, the Scanner falls back to a HashJoinExec with JoinType::Inner on _rowid. This executes both searches independently and intersects the results.
The join-based path is selected automatically by the Scanner when needed. The relevant execution plan is:
ProjectionExec (deduplicate _rowid)
-> HashJoinExec (Inner on _rowid, PartitionMode::CollectLeft)
-> Left: input plan (e.g., vector search with _rowid, _distance)
-> Right: FTS plan (with _rowid, _score)
Execution Strategy Selection
| Primary | Filter | Prefilter | Strategy |
|---|---|---|---|
| FTS | Vector | true | Vector search first, then FlatMatchQueryExec or HashJoinExec for FTS
|
| FTS | None | true | Standard FTS index search |
| FTS | any | false | FTS index search, then postfilter in memory |
| Vector | FTS | true | FTS index search first, then Take vector column, then flat_knn
|
| Vector | None | true | Standard ANN/KNN vector search |
| Vector | any | false | Vector search, then postfilter in memory |
| Either | Either | -- | HashJoinExec (Inner) on _rowid as fallback
|
Output Schema
| Column | Type | Source | Description |
|---|---|---|---|
_rowid |
UInt64 |
Both searches | Row identifier (deduplicated after join) |
_score |
Float32 |
FTS search | BM25 relevance score (present when FTS is involved) |
_distance |
Float32 |
Vector search | Vector distance metric (present when vector search is involved) |
| projected columns | varies | Dataset | Any columns requested via project()
|
Both _score and _distance are preserved in the output, enabling application-level score fusion.