Implementation:Lance format Lance LegacyBinaryEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy binary encoding is a physical encoding that stores variable-length binary data using separate indices and bytes buffers in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements the physical binary page encoding for the legacy (v2.0) Lance file format. BinaryPageScheduler coordinates the decoding of variable-length binary data (strings, binary blobs) by scheduling I/O for both an indices buffer and a bytes buffer. The indices represent cumulative byte offsets, and null values are tracked using a null adjustment threshold: offset values exceeding this threshold indicate null entries. During decoding, an IndicesNormalizer transforms raw indices into normalized offsets and a validity bitmap. The BinaryEncoder handles encoding by converting Arrow arrays into indices and bytes buffers, with optional compression support (configurable via CompressionConfig). This encoder supports both regular and large binary/utf8 offset types (32-bit and 64-bit).
Usage
Use this encoding for variable-length binary and string data in the v2.0 format. The CoreArrayEncodingStrategy selects BinaryEncoder for DataType::Binary, DataType::LargeBinary, DataType::Utf8, and DataType::LargeUtf8 types. During reading, BinaryPageScheduler is created by the physical dispatch when encountering a Binary protobuf encoding.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/physical/binary.rs
Signature
pub struct BinaryPageScheduler {
indices_scheduler: Arc<dyn PageScheduler>,
bytes_scheduler: Arc<dyn PageScheduler>,
offsets_type: DataType,
null_adjustment: u64,
}
impl BinaryPageScheduler {
pub fn new(
indices_scheduler: Arc<dyn PageScheduler>,
bytes_scheduler: Arc<dyn PageScheduler>,
offsets_type: DataType,
null_adjustment: u64,
) -> Self;
}
pub struct BinaryEncoder { /* fields omitted */ }
impl BinaryEncoder {
pub fn try_new(
indices_encoder: Box<dyn ArrayEncoder>,
compression: Option<CompressionConfig>,
) -> Result<Self>;
}
Import
use lance_encoding::previous::encodings::physical::binary::{
BinaryPageScheduler, BinaryEncoder,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| indices_scheduler | Arc<dyn PageScheduler> |
Scheduler for the byte offset indices buffer |
| bytes_scheduler | Arc<dyn PageScheduler> |
Scheduler for the raw bytes buffer |
| offsets_type | DataType |
Int32 or Int64 depending on binary variant |
| null_adjustment | u64 |
Threshold for null detection in offset values |
| data | DataBlock |
Variable-width data block to encode |
| Output | Type | Description |
|---|---|---|
| decoded | DataBlock |
Variable-width data block with offsets and bytes |
| encoded | EncodedArray |
Encoded indices and bytes buffers with encoding descriptor |
Usage Examples
use lance_encoding::previous::encodings::physical::binary::BinaryPageScheduler;
use lance_encoding::decoder::PageScheduler;
use arrow_schema::DataType;
use std::sync::Arc;
// Create a binary page scheduler from inner schedulers
let indices_scheduler: Arc<dyn PageScheduler> = /* from dispatch */;
let bytes_scheduler: Arc<dyn PageScheduler> = /* from dispatch */;
let scheduler = BinaryPageScheduler::new(
indices_scheduler,
bytes_scheduler,
DataType::Int32, // for regular Binary/Utf8
0, // null_adjustment
);
// Schedule ranges for decoding
let ranges = vec![0..100];
let io: Arc<dyn EncodingsIo> = /* from context */;
let decoder_fut = scheduler.schedule_ranges(&ranges, &io, 0);
Related Pages
- Lance_format_Lance_LegacyPhysicalDispatch - Creates BinaryPageScheduler from protobuf
- Lance_format_Lance_LegacyEncoder - Encoding strategy that selects BinaryEncoder
- Lance_format_Lance_LegacyValueEncoding - Used to encode the indices buffer
- Lance_format_Lance_LegacyFsstEncoding - FSST compression wraps binary encoding
- Lance_format_Lance_LegacyLogicalBinaryEncoding - Logical layer above physical binary
- Lance_format_Lance_LegacyFixedSizeBinaryEncoding - Alternative for fixed-width binary data
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings