Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance Scanner Nearest

From Leeroopedia


Knowledge Sources
Domains Vector_Search, Query_Execution
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for executing approximate nearest neighbor (ANN) vector search queries on a Lance dataset, provided by the Scanner API.

Description

Scanner::nearest configures the Scanner to perform a k-nearest-neighbor search on a specified vector column. It validates the query vector's type and dimensionality against the column schema, then stores the query parameters internally. The actual search is executed when the Scanner is converted to a stream via into_stream().

Additional methods on the Scanner fine-tune the search behavior: nprobes controls IVF partition search breadth, ef controls HNSW beam width, refine enables post-search re-ranking with original vectors, and distance_metric overrides the default metric.

Usage

Use this API when you need to:

  • Find the k most similar vectors to a query in a Lance dataset.
  • Combine vector search with column projections, filters, and limits.
  • Tune search recall and latency with nprobes, ef, and refine parameters.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/dataset/scanner.rs
  • Lines: L1147-L1263 (nearest), L1288-L1296 (nprobes), L1349-L1354 (ef), L1379-L1384 (refine), L1387-L1392 (distance_metric)

Signature

impl Scanner {
    /// Find k-nearest neighbor within the vector column.
    /// The query can be a Float16Array, Float32Array, Float64Array, UInt8Array,
    /// or a ListArray/FixedSizeListArray of the above types.
    pub fn nearest(
        &mut self,
        column: &str,
        q: &dyn Array,
        k: usize,
    ) -> Result<&mut Self>;

    /// Set the number of IVF partitions to search.
    pub fn nprobes(&mut self, n: usize) -> &mut Self;

    /// Set the HNSW search beam width (ef parameter).
    pub fn ef(&mut self, ef: usize) -> &mut Self;

    /// Apply a refine step to re-rank results with original vectors.
    pub fn refine(&mut self, factor: u32) -> &mut Self;

    /// Override the distance metric (L2, Cosine, Dot).
    pub fn distance_metric(&mut self, metric_type: MetricType) -> &mut Self;

    /// Set distance range bounds for filtering results.
    pub fn distance_range(
        &mut self,
        lower_bound: Option<f32>,
        upper_bound: Option<f32>,
    ) -> &mut Self;
}

Import

use lance::dataset::Dataset;
use lance::dataset::scanner::Scanner;
use arrow_array::Float32Array;
use lance_linalg::distance::MetricType;

I/O Contract

Inputs

Name Type Required Description
column &str Yes Name of the vector column to search. Must exist in the dataset schema and be a FixedSizeList or List type.
q &dyn Array Yes Query vector. Supported types: Float16Array, Float32Array, Float64Array, UInt8Array, or ListArray/FixedSizeListArray for multi-vector queries. Dimensionality must match the column.
k usize Yes Number of nearest neighbors to return. Must be positive (> 0).
nprobes usize No Number of IVF partitions to search. Default: 1. Higher values increase recall. Sets both minimum and maximum nprobes.
ef usize No HNSW search beam width. Default: not set (uses index default). Higher values increase recall for HNSW-based indices.
refine_factor u32 No Factor for re-ranking with original vectors. E.g., factor=2 reads 2k candidates and re-ranks to return k. Factor=1 re-ranks without extra reads. Default: not set (no refinement).
metric_type MetricType No Override distance metric: L2, Cosine, or Dot. Default: uses the metric from index training.
lower_bound Option<f32> No Minimum distance threshold for results. Results closer than this are excluded.
upper_bound Option<f32> No Maximum distance threshold for results. Results farther than this are excluded.

Outputs

Name Type Description
stream DatasetRecordBatchStream A stream of RecordBatch values containing the projected columns plus a _distance column (Float32) with the computed distance from each result row to the query vector. Results are ordered by ascending distance (most similar first).

Usage Examples

Basic k-NN search

use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;

async fn search(dataset: &Dataset) -> lance::Result<()> {
    let query = Float32Array::from(vec![0.1f32; 128]);

    let mut scanner = dataset.scan();
    scanner
        .nearest("vector", &query, 10)?
        .nprobes(20)
        .refine(2);

    let batches: Vec<_> = scanner
        .try_into_stream()
        .await?
        .try_collect()
        .await?;

    for batch in &batches {
        println!("Got {} results", batch.num_rows());
    }
    Ok(())
}

ANN search with HNSW ef and distance range

use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;

async fn hnsw_search(dataset: &Dataset) -> lance::Result<()> {
    let query = Float32Array::from(vec![0.5f32; 256]);

    let mut scanner = dataset.scan();
    scanner
        .nearest("embedding", &query, 50)?
        .nprobes(10)
        .ef(200)
        .distance_range(None, Some(0.5));  // only results within distance 0.5

    let batches: Vec<_> = scanner
        .try_into_stream()
        .await?
        .try_collect()
        .await?;

    Ok(())
}

Combining ANN search with column projection

use lance::dataset::Dataset;
use arrow_array::Float32Array;
use futures::TryStreamExt;

async fn projected_search(dataset: &Dataset) -> lance::Result<()> {
    let query = Float32Array::from(vec![1.0f32; 128]);

    let mut scanner = dataset.scan();
    scanner
        .project(&["id", "title"])?
        .nearest("vector", &query, 20)?
        .nprobes(5);

    let batches: Vec<_> = scanner
        .try_into_stream()
        .await?
        .try_collect()
        .await?;

    // Each batch has columns: id, title, _distance
    Ok(())
}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment