Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyBinaryEncoding

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy binary encoding is a physical encoding that stores variable-length binary data using separate indices and bytes buffers in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements the physical binary page encoding for the legacy (v2.0) Lance file format. BinaryPageScheduler coordinates the decoding of variable-length binary data (strings, binary blobs) by scheduling I/O for both an indices buffer and a bytes buffer. The indices represent cumulative byte offsets, and null values are tracked using a null adjustment threshold: offset values exceeding this threshold indicate null entries. During decoding, an IndicesNormalizer transforms raw indices into normalized offsets and a validity bitmap. The BinaryEncoder handles encoding by converting Arrow arrays into indices and bytes buffers, with optional compression support (configurable via CompressionConfig). This encoder supports both regular and large binary/utf8 offset types (32-bit and 64-bit).

Usage

Use this encoding for variable-length binary and string data in the v2.0 format. The CoreArrayEncodingStrategy selects BinaryEncoder for DataType::Binary, DataType::LargeBinary, DataType::Utf8, and DataType::LargeUtf8 types. During reading, BinaryPageScheduler is created by the physical dispatch when encountering a Binary protobuf encoding.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/physical/binary.rs

Signature

pub struct BinaryPageScheduler {
    indices_scheduler: Arc<dyn PageScheduler>,
    bytes_scheduler: Arc<dyn PageScheduler>,
    offsets_type: DataType,
    null_adjustment: u64,
}

impl BinaryPageScheduler {
    pub fn new(
        indices_scheduler: Arc<dyn PageScheduler>,
        bytes_scheduler: Arc<dyn PageScheduler>,
        offsets_type: DataType,
        null_adjustment: u64,
    ) -> Self;
}

pub struct BinaryEncoder { /* fields omitted */ }

impl BinaryEncoder {
    pub fn try_new(
        indices_encoder: Box<dyn ArrayEncoder>,
        compression: Option<CompressionConfig>,
    ) -> Result<Self>;
}

Import

use lance_encoding::previous::encodings::physical::binary::{
    BinaryPageScheduler, BinaryEncoder,
};

I/O Contract

Input Type Description
indices_scheduler Arc<dyn PageScheduler> Scheduler for the byte offset indices buffer
bytes_scheduler Arc<dyn PageScheduler> Scheduler for the raw bytes buffer
offsets_type DataType Int32 or Int64 depending on binary variant
null_adjustment u64 Threshold for null detection in offset values
data DataBlock Variable-width data block to encode
Output Type Description
decoded DataBlock Variable-width data block with offsets and bytes
encoded EncodedArray Encoded indices and bytes buffers with encoding descriptor

Usage Examples

use lance_encoding::previous::encodings::physical::binary::BinaryPageScheduler;
use lance_encoding::decoder::PageScheduler;
use arrow_schema::DataType;
use std::sync::Arc;

// Create a binary page scheduler from inner schedulers
let indices_scheduler: Arc<dyn PageScheduler> = /* from dispatch */;
let bytes_scheduler: Arc<dyn PageScheduler> = /* from dispatch */;
let scheduler = BinaryPageScheduler::new(
    indices_scheduler,
    bytes_scheduler,
    DataType::Int32,   // for regular Binary/Utf8
    0,                 // null_adjustment
);

// Schedule ranges for decoding
let ranges = vec![0..100];
let io: Arc<dyn EncodingsIo> = /* from context */;
let decoder_fut = scheduler.schedule_ranges(&ranges, &io, 0);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment