Implementation:Lance format Lance LegacyLogicalBinaryEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy logical binary encoding is a logical-level wrapper that casts decoded list-of-bytes data into the appropriate binary or string Arrow type in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements the logical binary field scheduling and decoding for the legacy (v2.0) Lance file format. In v2.0, binary and string data (Utf8, LargeUtf8, Binary, LargeBinary) is internally stored as List<u8>. The BinaryFieldScheduler wraps an inner variable-binary field scheduler (typically a list scheduler) and the BinaryPageDecoder wraps the inner logical page decoder to cast the output from list form to the target binary/string Arrow type. The BinaryArrayDecoder performs the final type conversion during the decode task by extracting the inner byte list array and reconstructing the appropriate GenericByteArray with proper offset types. The module handles all four binary-like types: Binary, LargeBinary, Utf8, and LargeUtf8. The 2.0 schedulers do not require initialization.
Usage
Use this encoding for reading binary and string columns from Lance v2.0 files. BinaryFieldScheduler is created during file reading when the schema indicates a binary or string field. It delegates scheduling to the underlying list-of-bytes scheduler and wraps the resulting decoders to produce the correct output type.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/logical/binary.rs
Signature
pub struct BinaryFieldScheduler {
varbin_scheduler: Arc<dyn FieldScheduler>,
data_type: DataType,
}
impl BinaryFieldScheduler {
pub fn new(varbin_scheduler: Arc<dyn FieldScheduler>, data_type: DataType) -> Self;
}
impl FieldScheduler for BinaryFieldScheduler { /* ... */ }
pub struct BinaryPageDecoder {
inner: Box<dyn LogicalPageDecoder>,
data_type: DataType,
}
impl LogicalPageDecoder for BinaryPageDecoder { /* ... */ }
Import
use lance_encoding::previous::encodings::logical::binary::{
BinaryFieldScheduler, BinaryPageDecoder,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| varbin_scheduler | Arc<dyn FieldScheduler> |
Inner scheduler producing List<u8> data |
| data_type | DataType |
Target type: Utf8, LargeUtf8, Binary, or LargeBinary |
| ranges | &[Range<u64>] |
Row ranges to schedule for reading |
| Output | Type | Description |
|---|---|---|
| decoded | ArrayRef |
Arrow StringArray, LargeStringArray, BinaryArray, or LargeBinaryArray |
Usage Examples
use lance_encoding::previous::encodings::logical::binary::BinaryFieldScheduler;
use lance_encoding::previous::decoder::FieldScheduler;
use arrow_schema::DataType;
use std::sync::Arc;
// Create a binary field scheduler wrapping a list-of-bytes scheduler
let varbin_scheduler: Arc<dyn FieldScheduler> = /* from column metadata */;
let scheduler = BinaryFieldScheduler::new(
varbin_scheduler,
DataType::Utf8,
);
// Schedule row ranges
let ranges = vec![0..100];
let filter = FilterExpression::no_filter();
let mut job = scheduler.schedule_ranges(&ranges, &filter)?;
Related Pages
- Lance_format_Lance_LegacyListEncoding - Inner list scheduler that produces List<u8> data
- Lance_format_Lance_LegacyDecoder - Base decoder traits
- Lance_format_Lance_LegacyBinaryEncoding - Physical binary page encoding
- Lance_format_Lance_LegacyFsstEncoding - FSST compression for string data
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings