Implementation:Lance format Lance TakeBench
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Performance |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
TakeBench is a Criterion-based benchmark that measures the performance of random row access ("take") operations across multiple levels of the Lance storage stack. It evaluates how efficiently Lance can retrieve random subsets of rows from datasets stored in memory, comparing different file format versions and access patterns.
The benchmark exercises five distinct access paths:
- Random Take with Dataset — Uses
Dataset::take_rowsto retrieve random rows through the high-level dataset API. Tests both V2_0 and V2_1 file versions with varying file sizes (1024 and 1M rows per file). - Random Single Take with FileReader — Reads individual rows one at a time through the low-level
FileReaderAPI. - Random Batch Take with FileReader — Reads sorted batches of rows through the
FileReaderAPI, enabling more efficient sequential I/O. - Random Single Take with FileFragment — Reads individual rows through the
FileFragmentAPI (mid-level abstraction). - Random Batch Take with FileFragment — Reads sorted batches of rows through the
FileFragmentAPI.
Each access path is tested with take sizes of 1, 10, 100, and 1000 rows. The dataset schema includes an integer column, a float column, a binary column, and a fixed-size list (vector) column. Test datasets are created in memory (memory:// URI) to eliminate storage I/O variance.
Usage
This benchmark is used to evaluate and optimize random access performance in Lance, which is critical for operations like vector search result retrieval, row-level updates, and selective data loading. It helps developers compare the overhead of different abstraction layers (Dataset vs FileFragment vs FileReader) and measure the impact of file format version changes.
Code Reference
Source Location
rust/lance/benches/take.rs (440 lines)
Signature
The benchmark defines five top-level functions registered with Criterion:
fn bench_random_take_with_dataset(c: &mut Criterion)
fn bench_random_single_take_with_file_reader(c: &mut Criterion)
fn bench_random_batch_take_with_file_reader(c: &mut Criterion)
fn bench_random_single_take_with_file_fragment(c: &mut Criterion)
fn bench_random_batch_take_with_file_fragment(c: &mut Criterion)
Import
use lance::dataset::ProjectionRequest;
use lance::dataset::{Dataset, WriteMode, WriteParams};
use lance_file::reader::{FileReader, FileReaderOptions};
use lance_file::version::LanceFileVersion;
use lance_io::scheduler::{ScanScheduler, SchedulerConfig};
use lance_io::ReadBatchParams;
use lance_encoding::decoder::{DecoderPlugins, FilterExpression};
use criterion::{criterion_group, criterion_main, Criterion};
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
BATCH_SIZE |
Constant | 1024 rows per batch when creating the test dataset |
num_batches |
Constant | 1024 batches per dataset |
file_size |
Variable | Number of rows per file; tested with 1024 and 1,048,576 |
num_rows |
Variable | Number of rows per take operation; tested with 1, 10, 100, 1000 |
| File versions | LanceFileVersion |
V2_0 and V2_1 |
Outputs
| Output | Type | Description |
|---|---|---|
| Criterion report | HTML/JSON | Latency statistics for each combination of access path, file size, file version, and take size |
| Assertions | Runtime check | Each iteration asserts the returned batch has the expected row count |
Usage Examples
Run the full take benchmark suite:
cargo bench -p lance --bench take
Run only the dataset-level take benchmarks:
cargo bench -p lance --bench take -- "Dataset"
Run only the FileReader benchmarks:
cargo bench -p lance --bench take -- "FileReader"
On Linux, the benchmark uses a sample size of 10,000 iterations with a 3-second warm-up and a significance level of 0.01, along with flamegraph profiling via pprof. On non-Linux platforms, it uses 10 iterations with a significance level of 0.1.
Related Pages
- Lance_format_Lance_DecoderBench — Benchmark for Lance encoding/decoding throughput
- Lance_format_Lance_VectorThroughputBench — Benchmark for IVF_PQ vector search throughput