Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyLogicalBinaryEncoding

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy logical binary encoding is a logical-level wrapper that casts decoded list-of-bytes data into the appropriate binary or string Arrow type in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements the logical binary field scheduling and decoding for the legacy (v2.0) Lance file format. In v2.0, binary and string data (Utf8, LargeUtf8, Binary, LargeBinary) is internally stored as List<u8>. The BinaryFieldScheduler wraps an inner variable-binary field scheduler (typically a list scheduler) and the BinaryPageDecoder wraps the inner logical page decoder to cast the output from list form to the target binary/string Arrow type. The BinaryArrayDecoder performs the final type conversion during the decode task by extracting the inner byte list array and reconstructing the appropriate GenericByteArray with proper offset types. The module handles all four binary-like types: Binary, LargeBinary, Utf8, and LargeUtf8. The 2.0 schedulers do not require initialization.

Usage

Use this encoding for reading binary and string columns from Lance v2.0 files. BinaryFieldScheduler is created during file reading when the schema indicates a binary or string field. It delegates scheduling to the underlying list-of-bytes scheduler and wraps the resulting decoders to produce the correct output type.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/logical/binary.rs

Signature

pub struct BinaryFieldScheduler {
    varbin_scheduler: Arc<dyn FieldScheduler>,
    data_type: DataType,
}

impl BinaryFieldScheduler {
    pub fn new(varbin_scheduler: Arc<dyn FieldScheduler>, data_type: DataType) -> Self;
}

impl FieldScheduler for BinaryFieldScheduler { /* ... */ }

pub struct BinaryPageDecoder {
    inner: Box<dyn LogicalPageDecoder>,
    data_type: DataType,
}

impl LogicalPageDecoder for BinaryPageDecoder { /* ... */ }

Import

use lance_encoding::previous::encodings::logical::binary::{
    BinaryFieldScheduler, BinaryPageDecoder,
};

I/O Contract

Input Type Description
varbin_scheduler Arc<dyn FieldScheduler> Inner scheduler producing List<u8> data
data_type DataType Target type: Utf8, LargeUtf8, Binary, or LargeBinary
ranges &[Range<u64>] Row ranges to schedule for reading
Output Type Description
decoded ArrayRef Arrow StringArray, LargeStringArray, BinaryArray, or LargeBinaryArray

Usage Examples

use lance_encoding::previous::encodings::logical::binary::BinaryFieldScheduler;
use lance_encoding::previous::decoder::FieldScheduler;
use arrow_schema::DataType;
use std::sync::Arc;

// Create a binary field scheduler wrapping a list-of-bytes scheduler
let varbin_scheduler: Arc<dyn FieldScheduler> = /* from column metadata */;
let scheduler = BinaryFieldScheduler::new(
    varbin_scheduler,
    DataType::Utf8,
);

// Schedule row ranges
let ranges = vec![0..100];
let filter = FilterExpression::no_filter();
let mut job = scheduler.schedule_ranges(&ranges, &filter)?;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment