Implementation:Lance format Lance Scanner Nearest
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, Query_Execution |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for executing approximate nearest neighbor (ANN) vector search queries on a Lance dataset, provided by the Scanner API.
Description
Scanner::nearest configures the Scanner to perform a k-nearest-neighbor search on a specified vector column. It validates the query vector's type and dimensionality against the column schema, then stores the query parameters internally. The actual search is executed when the Scanner is converted to a stream via into_stream().
Additional methods on the Scanner fine-tune the search behavior: nprobes controls IVF partition search breadth, ef controls HNSW beam width, refine enables post-search re-ranking with original vectors, and distance_metric overrides the default metric.
Usage
Use this API when you need to:
- Find the k most similar vectors to a query in a Lance dataset.
- Combine vector search with column projections, filters, and limits.
- Tune search recall and latency with nprobes, ef, and refine parameters.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/dataset/scanner.rs - Lines: L1147-L1263 (nearest), L1288-L1296 (nprobes), L1349-L1354 (ef), L1379-L1384 (refine), L1387-L1392 (distance_metric)
Signature
impl Scanner {
/// Find k-nearest neighbor within the vector column.
/// The query can be a Float16Array, Float32Array, Float64Array, UInt8Array,
/// or a ListArray/FixedSizeListArray of the above types.
pub fn nearest(
&mut self,
column: &str,
q: &dyn Array,
k: usize,
) -> Result<&mut Self>;
/// Set the number of IVF partitions to search.
pub fn nprobes(&mut self, n: usize) -> &mut Self;
/// Set the HNSW search beam width (ef parameter).
pub fn ef(&mut self, ef: usize) -> &mut Self;
/// Apply a refine step to re-rank results with original vectors.
pub fn refine(&mut self, factor: u32) -> &mut Self;
/// Override the distance metric (L2, Cosine, Dot).
pub fn distance_metric(&mut self, metric_type: MetricType) -> &mut Self;
/// Set distance range bounds for filtering results.
pub fn distance_range(
&mut self,
lower_bound: Option<f32>,
upper_bound: Option<f32>,
) -> &mut Self;
}
Import
use lance::dataset::Dataset;
use lance::dataset::scanner::Scanner;
use arrow_array::Float32Array;
use lance_linalg::distance::MetricType;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column | &str |
Yes | Name of the vector column to search. Must exist in the dataset schema and be a FixedSizeList or List type.
|
| q | &dyn Array |
Yes | Query vector. Supported types: Float16Array, Float32Array, Float64Array, UInt8Array, or ListArray/FixedSizeListArray for multi-vector queries. Dimensionality must match the column.
|
| k | usize |
Yes | Number of nearest neighbors to return. Must be positive (> 0). |
| nprobes | usize |
No | Number of IVF partitions to search. Default: 1. Higher values increase recall. Sets both minimum and maximum nprobes. |
| ef | usize |
No | HNSW search beam width. Default: not set (uses index default). Higher values increase recall for HNSW-based indices. |
| refine_factor | u32 |
No | Factor for re-ranking with original vectors. E.g., factor=2 reads 2k candidates and re-ranks to return k. Factor=1 re-ranks without extra reads. Default: not set (no refinement). |
| metric_type | MetricType |
No | Override distance metric: L2, Cosine, or Dot. Default: uses the metric from index training.
|
| lower_bound | Option<f32> |
No | Minimum distance threshold for results. Results closer than this are excluded. |
| upper_bound | Option<f32> |
No | Maximum distance threshold for results. Results farther than this are excluded. |
Outputs
| Name | Type | Description |
|---|---|---|
| stream | DatasetRecordBatchStream |
A stream of RecordBatch values containing the projected columns plus a _distance column (Float32) with the computed distance from each result row to the query vector. Results are ordered by ascending distance (most similar first).
|
Usage Examples
Basic k-NN search
use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;
async fn search(dataset: &Dataset) -> lance::Result<()> {
let query = Float32Array::from(vec![0.1f32; 128]);
let mut scanner = dataset.scan();
scanner
.nearest("vector", &query, 10)?
.nprobes(20)
.refine(2);
let batches: Vec<_> = scanner
.try_into_stream()
.await?
.try_collect()
.await?;
for batch in &batches {
println!("Got {} results", batch.num_rows());
}
Ok(())
}
ANN search with HNSW ef and distance range
use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;
async fn hnsw_search(dataset: &Dataset) -> lance::Result<()> {
let query = Float32Array::from(vec![0.5f32; 256]);
let mut scanner = dataset.scan();
scanner
.nearest("embedding", &query, 50)?
.nprobes(10)
.ef(200)
.distance_range(None, Some(0.5)); // only results within distance 0.5
let batches: Vec<_> = scanner
.try_into_stream()
.await?
.try_collect()
.await?;
Ok(())
}
Combining ANN search with column projection
use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;
async fn projected_search(dataset: &Dataset) -> lance::Result<()> {
let query = Float32Array::from(vec![1.0f32; 128]);
let mut scanner = dataset.scan();
scanner
.project(&["id", "title"])?
.nearest("vector", &query, 20)?
.nprobes(5);
let batches: Vec<_> = scanner
.try_into_stream()
.await?
.try_collect()
.await?;
// Each batch has columns: id, title, _distance
Ok(())
}
Related Pages
Implements Principle
Requires Environment
- Environment:Lance_format_Lance_Rust_Toolchain
- Environment:Lance_format_Lance_Python_Environment
- Environment:Lance_format_Lance_SIMD_And_Platform_Requirements