Implementation:Lance format Lance TakeBench

Knowledge Sources	Lance
Domains	Benchmarking, Performance
Last Updated	2026-02-08 19:33 GMT

Overview

Description

TakeBench is a Criterion-based benchmark that measures the performance of random row access ("take") operations across multiple levels of the Lance storage stack. It evaluates how efficiently Lance can retrieve random subsets of rows from datasets stored in memory, comparing different file format versions and access patterns.

The benchmark exercises five distinct access paths:

Random Take with Dataset — Uses Dataset::take_rows to retrieve random rows through the high-level dataset API. Tests both V2_0 and V2_1 file versions with varying file sizes (1024 and 1M rows per file).
Random Single Take with FileReader — Reads individual rows one at a time through the low-level FileReader API.
Random Batch Take with FileReader — Reads sorted batches of rows through the FileReader API, enabling more efficient sequential I/O.
Random Single Take with FileFragment — Reads individual rows through the FileFragment API (mid-level abstraction).
Random Batch Take with FileFragment — Reads sorted batches of rows through the FileFragment API.

Each access path is tested with take sizes of 1, 10, 100, and 1000 rows. The dataset schema includes an integer column, a float column, a binary column, and a fixed-size list (vector) column. Test datasets are created in memory (memory:// URI) to eliminate storage I/O variance.

Usage

This benchmark is used to evaluate and optimize random access performance in Lance, which is critical for operations like vector search result retrieval, row-level updates, and selective data loading. It helps developers compare the overhead of different abstraction layers (Dataset vs FileFragment vs FileReader) and measure the impact of file format version changes.

Code Reference

Source Location

rust/lance/benches/take.rs (440 lines)

Signature

The benchmark defines five top-level functions registered with Criterion:

fn bench_random_take_with_dataset(c: &mut Criterion)
fn bench_random_single_take_with_file_reader(c: &mut Criterion)
fn bench_random_batch_take_with_file_reader(c: &mut Criterion)
fn bench_random_single_take_with_file_fragment(c: &mut Criterion)
fn bench_random_batch_take_with_file_fragment(c: &mut Criterion)

Import

use lance::dataset::ProjectionRequest;
use lance::dataset::{Dataset, WriteMode, WriteParams};
use lance_file::reader::{FileReader, FileReaderOptions};
use lance_file::version::LanceFileVersion;
use lance_io::scheduler::{ScanScheduler, SchedulerConfig};
use lance_io::ReadBatchParams;
use lance_encoding::decoder::{DecoderPlugins, FilterExpression};
use criterion::{criterion_group, criterion_main, Criterion};

I/O Contract

Inputs

Parameter	Type	Description
`BATCH_SIZE`	Constant	1024 rows per batch when creating the test dataset
`num_batches`	Constant	1024 batches per dataset
`file_size`	Variable	Number of rows per file; tested with 1024 and 1,048,576
`num_rows`	Variable	Number of rows per take operation; tested with 1, 10, 100, 1000
File versions	`LanceFileVersion`	V2_0 and V2_1

Outputs

Output	Type	Description
Criterion report	HTML/JSON	Latency statistics for each combination of access path, file size, file version, and take size
Assertions	Runtime check	Each iteration asserts the returned batch has the expected row count

Usage Examples

Run the full take benchmark suite:

cargo bench -p lance --bench take

Run only the dataset-level take benchmarks:

cargo bench -p lance --bench take -- "Dataset"

Run only the FileReader benchmarks:

cargo bench -p lance --bench take -- "FileReader"

On Linux, the benchmark uses a sample size of 10,000 iterations with a 3-second warm-up and a significance level of 0.01, along with flamegraph profiling via pprof. On non-Linux platforms, it uses 10 iterations with a significance level of 0.1.

Related Pages

Lance_format_Lance_DecoderBench — Benchmark for Lance encoding/decoding throughput
Lance_format_Lance_VectorThroughputBench — Benchmark for IVF_PQ vector search throughput

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment