Implementation:Lance format Lance DecoderBench

Knowledge Sources	Lance
Domains	Benchmarking, Performance
Last Updated	2026-02-08 19:33 GMT

Overview

Description

DecoderBench is a Criterion-based benchmark suite in the lance-encoding crate that measures the throughput of decoding various Arrow data types from the Lance columnar encoding format. It encodes data using encode_batch and then repeatedly decodes via decode_batch, reporting throughput in bytes per second. The benchmark covers primitive types (integers, floats, dates, timestamps, decimals, durations), fixed-size lists (FSL), dictionary-encoded strings, packed structs, fixed-size binary-encoded strings, and compressed string columns using zstd and lz4. A parallel decoding benchmark measures contention on shared decompressor resources when multiple decode streams run concurrently.

The suite contains seven benchmark functions organized into four Criterion benchmark groups:

decode_primitive — Decodes 128 MiB of data for each of 18 primitive Arrow data types.
decode_fsl — Decodes fixed-size lists across multiple dimensions (4, 16, 32, 64, 128), file versions (V2_0, V2_1), and nullable/non-nullable variants.
decode_compressed — Decodes 5 million rows of 10 compressible string columns with zstd and lz4 compression.
decode_compressed_parallel — Parallel decoding of 1 million rows across 10 compressed columns using a streaming decoder with configurable batch size.

Usage

This benchmark is used by Lance developers to track decoding performance regressions and to evaluate the impact of encoding strategy changes, new compression algorithms, or decoder optimizations. It is especially useful for comparing throughput across Lance file format versions (V2_0, V2_1, V2_2).

Code Reference

Source Location

rust/lance-encoding/benches/decoder.rs (576 lines)

Signature

The benchmark defines seven functions registered with Criterion:

fn bench_decode(c: &mut Criterion)
fn bench_decode_fsl(c: &mut Criterion)
fn bench_decode_str_with_dict_encoding(c: &mut Criterion)
fn bench_decode_packed_struct(c: &mut Criterion)
fn bench_decode_str_with_fixed_size_binary_encoding(c: &mut Criterion)
fn bench_decode_compressed(c: &mut Criterion)
fn bench_decode_compressed_parallel(c: &mut Criterion)

Import

This is a standalone benchmark binary. Key imports include:

use lance_encoding::{
    decoder::{create_decode_stream, DecodeBatchScheduler, DecoderConfig, DecoderPlugins, FilterExpression},
    encoder::{default_encoding_strategy, encode_batch, EncodingOptions},
    version::LanceFileVersion,
};
use lance_core::cache::LanceCache;
use lance_datagen::ArrayGeneratorExt;
use criterion::{criterion_group, criterion_main, Criterion};

I/O Contract

Inputs

Parameter	Type	Description
`PRIMITIVE_TYPES`	`&[DataType]`	18 Arrow primitive data types to benchmark (Date32, Date64, Int8..UInt64, Float16..Float64, Decimal128, Decimal256, Timestamp, Time32, Time64, Duration)
`PRIMITIVE_TYPES_FOR_FSL`	`&[DataType]`	Subset of primitive types for FSL benchmarks (Int8, Float32)
`NUM_BYTES`	`u64`	Data size per benchmark iteration (128 MiB for primitive and FSL groups)
`NUM_ROWS`	`u64`	Row counts vary per benchmark group (100,000 for dict strings, 10,000 for struct/fixed-utf8, 5,000,000 for compressed, 1,000,000 for compressed parallel)

Outputs

Output	Type	Description
Criterion report	HTML/JSON	Throughput statistics in bytes/sec or elements/sec, with optional flamegraph profiling on Linux
Decoded `RecordBatch`	In-memory	Each iteration produces a decoded RecordBatch; row count is asserted to match input

Usage Examples

Run the full decoder benchmark suite:

cargo bench -p lance-encoding --bench decoder

Run with a filter for a specific benchmark group:

cargo bench -p lance-encoding --bench decoder -- decode_compressed

On Linux, flamegraph profiling is automatically enabled via pprof. On non-Linux platforms, profiling is omitted. The benchmark uses a sample size of 10 iterations and a significance level of 0.1.

Related Pages

Lance_format_Lance_TakeBench — Benchmark for random row access (take) operations
Lance_format_Lance_VectorThroughputBench — Benchmark for IVF_PQ vector search throughput

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment