Implementation:Lance format Lance LegacyBitpackEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy bitpack encoding compresses integer arrays by packing values into fewer bits using the FastLanes algorithm in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements bitpacking for the legacy (v2.0) Lance file format, gated behind the bitpacking feature flag. It provides two encoder/scheduler pairs: BitpackedForNonNegArrayEncoder / BitpackedForNonNegScheduler for non-negative integers (unsigned types and signed types known to be non-negative), and BitpackedArrayEncoder / BitpackedScheduler for general signed integers. The encoding works by computing the minimum number of bits needed to represent all values in a page (using compute_compressed_bit_width_for_non_neg), then packing values into chunks of 1024 elements using the FastLanes bitpacking algorithm. The last chunk is zero-padded if the input is not a multiple of 1024. Supported data types include Int8/UInt8 through Int64/UInt64. The encoding also handles nullable data blocks by separately encoding validity bitmaps. Decoders unpack the compressed data back to full-width values during read.
Usage
Use this encoding for integer columns where the actual value range is significantly smaller than the full data type range. The CoreArrayEncodingStrategy selects bitpacking when it determines the compressed bit width provides meaningful savings. This requires the bitpacking feature to be enabled at compile time.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/physical/bitpack.rs
Signature
pub fn compute_compressed_bit_width_for_non_neg(arrays: &[ArrayRef]) -> u64;
pub struct BitpackedForNonNegArrayEncoder {
pub compressed_bit_width: usize,
pub original_data_type: DataType,
}
impl BitpackedForNonNegArrayEncoder {
pub fn new(compressed_bit_width: usize, data_type: DataType) -> Self;
}
impl ArrayEncoder for BitpackedForNonNegArrayEncoder { /* ... */ }
pub struct BitpackedForNonNegScheduler { /* fields omitted */ }
impl BitpackedForNonNegScheduler {
pub fn new(
compressed_bits_per_value: u64,
uncompressed_bits_per_value: u64,
buffer_offset: u64,
) -> Self;
}
pub struct BitpackedScheduler { /* fields omitted */ }
impl BitpackedScheduler {
pub fn new(
compressed_bits_per_value: u64,
uncompressed_bits_per_value: u64,
buffer_offset: u64,
signed: bool,
) -> Self;
}
Import
use lance_encoding::previous::encodings::physical::bitpack::{
compute_compressed_bit_width_for_non_neg,
BitpackedForNonNegArrayEncoder,
BitpackedForNonNegScheduler,
BitpackedScheduler,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| arrays | &[ArrayRef] |
Integer arrays to analyze for bit width computation |
| data | DataBlock |
Fixed-width or nullable data block to encode |
| compressed_bits_per_value | u64 |
Target bits per value after compression |
| uncompressed_bits_per_value | u64 |
Original bits per value of the data type |
| buffer_offset | u64 |
Position of the bitpacked buffer in the file |
| Output | Type | Description |
|---|---|---|
| bit_width | u64 |
Computed minimum bits needed to represent all values |
| encoded | EncodedArray |
Bitpacked data with encoding descriptor |
| decoded | DataBlock |
Unpacked fixed-width data block |
Usage Examples
use lance_encoding::previous::encodings::physical::bitpack::{
compute_compressed_bit_width_for_non_neg,
BitpackedForNonNegArrayEncoder,
};
use arrow_array::{ArrayRef, UInt32Array};
use arrow_schema::DataType;
use std::sync::Arc;
// Compute compressed bit width
let array: ArrayRef = Arc::new(UInt32Array::from(vec![0, 5, 10, 15, 20]));
let bit_width = compute_compressed_bit_width_for_non_neg(&[array.clone()]);
// Create encoder with computed bit width
let encoder = BitpackedForNonNegArrayEncoder::new(
bit_width as usize,
DataType::UInt32,
);
Related Pages
- Lance_format_Lance_LegacyEncoder - Encoding strategy that selects bitpacking
- Lance_format_Lance_LegacyPhysicalDispatch - Creates bitpacked schedulers from protobuf
- Lance_format_Lance_LegacyBasicEncoding - Wraps bitpacked encoding with nullability
- Lance_format_Lance_LegacyValueEncoding - Alternative flat encoding for values
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings