Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance TakeBench

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Performance
Last Updated 2026-02-08 19:33 GMT

Overview

Description

TakeBench is a Criterion-based benchmark that measures the performance of random row access ("take") operations across multiple levels of the Lance storage stack. It evaluates how efficiently Lance can retrieve random subsets of rows from datasets stored in memory, comparing different file format versions and access patterns.

The benchmark exercises five distinct access paths:

  • Random Take with Dataset — Uses Dataset::take_rows to retrieve random rows through the high-level dataset API. Tests both V2_0 and V2_1 file versions with varying file sizes (1024 and 1M rows per file).
  • Random Single Take with FileReader — Reads individual rows one at a time through the low-level FileReader API.
  • Random Batch Take with FileReader — Reads sorted batches of rows through the FileReader API, enabling more efficient sequential I/O.
  • Random Single Take with FileFragment — Reads individual rows through the FileFragment API (mid-level abstraction).
  • Random Batch Take with FileFragment — Reads sorted batches of rows through the FileFragment API.

Each access path is tested with take sizes of 1, 10, 100, and 1000 rows. The dataset schema includes an integer column, a float column, a binary column, and a fixed-size list (vector) column. Test datasets are created in memory (memory:// URI) to eliminate storage I/O variance.

Usage

This benchmark is used to evaluate and optimize random access performance in Lance, which is critical for operations like vector search result retrieval, row-level updates, and selective data loading. It helps developers compare the overhead of different abstraction layers (Dataset vs FileFragment vs FileReader) and measure the impact of file format version changes.

Code Reference

Source Location

rust/lance/benches/take.rs (440 lines)

Signature

The benchmark defines five top-level functions registered with Criterion:

fn bench_random_take_with_dataset(c: &mut Criterion)
fn bench_random_single_take_with_file_reader(c: &mut Criterion)
fn bench_random_batch_take_with_file_reader(c: &mut Criterion)
fn bench_random_single_take_with_file_fragment(c: &mut Criterion)
fn bench_random_batch_take_with_file_fragment(c: &mut Criterion)

Import

use lance::dataset::ProjectionRequest;
use lance::dataset::{Dataset, WriteMode, WriteParams};
use lance_file::reader::{FileReader, FileReaderOptions};
use lance_file::version::LanceFileVersion;
use lance_io::scheduler::{ScanScheduler, SchedulerConfig};
use lance_io::ReadBatchParams;
use lance_encoding::decoder::{DecoderPlugins, FilterExpression};
use criterion::{criterion_group, criterion_main, Criterion};

I/O Contract

Inputs

Parameter Type Description
BATCH_SIZE Constant 1024 rows per batch when creating the test dataset
num_batches Constant 1024 batches per dataset
file_size Variable Number of rows per file; tested with 1024 and 1,048,576
num_rows Variable Number of rows per take operation; tested with 1, 10, 100, 1000
File versions LanceFileVersion V2_0 and V2_1

Outputs

Output Type Description
Criterion report HTML/JSON Latency statistics for each combination of access path, file size, file version, and take size
Assertions Runtime check Each iteration asserts the returned batch has the expected row count

Usage Examples

Run the full take benchmark suite:

cargo bench -p lance --bench take

Run only the dataset-level take benchmarks:

cargo bench -p lance --bench take -- "Dataset"

Run only the FileReader benchmarks:

cargo bench -p lance --bench take -- "FileReader"

On Linux, the benchmark uses a sample size of 10,000 iterations with a 3-second warm-up and a significance level of 0.01, along with flamegraph profiling via pprof. On non-Linux platforms, it uses 10 iterations with a significance level of 0.1.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment