Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyDecoder

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy decoder module defines the core traits for scheduling I/O and decoding data written in the Lance v2.0 file format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module provides the foundational decoding infrastructure for the legacy (v2.0) Lance file format. It defines three primary traits: FieldScheduler for scheduling I/O at the field level, SchedulingJob for tracking the progress of scheduled I/O operations, and LogicalPageDecoder for decoding loaded data into Arrow arrays. The FieldScheduler is stateless and must be Send + Sync since it may be shared across tasks (e.g., list pages share item schedulers). The LogicalPageDecoder is stateful and only Send, managing the lifecycle of loaded data from waiting for I/O completion through draining decoded rows. The module also defines DecoderReady, which pairs a decoder with a path describing its position in the schema hierarchy.

Usage

Use this module when reading Lance files written in the v2.0 format. A FieldScheduler is created per output field and calculates the necessary I/O. It emits LogicalPageDecoder instances in row-major order. Consumers call wait_for_loaded to ensure data is available, then drain to extract decoded Arrow arrays.

Code Reference

Source Location

rust/lance-encoding/src/previous/decoder.rs

Signature

pub trait SchedulingJob: std::fmt::Debug {
    fn schedule_next(
        &mut self,
        context: &mut SchedulerContext,
        priority: &dyn PriorityRange,
    ) -> Result<ScheduledScanLine>;
    fn num_rows(&self) -> u64;
}

pub trait FieldScheduler: Send + Sync + std::fmt::Debug {
    fn initialize<'a>(
        &'a self,
        filter: &'a FilterExpression,
        context: &'a SchedulerContext,
    ) -> BoxFuture<'a, Result<()>>;
    fn schedule_ranges<'a>(
        &'a self,
        ranges: &[Range<u64>],
        filter: &FilterExpression,
    ) -> Result<Box<dyn SchedulingJob + 'a>>;
    fn num_rows(&self) -> u64;
}

pub trait LogicalPageDecoder: std::fmt::Debug + Send {
    fn accept_child(&mut self, child: DecoderReady) -> Result<()>;
    fn wait_for_loaded(&mut self, loaded_need: u64) -> BoxFuture<'_, Result<()>>;
    fn rows_loaded(&self) -> u64;
    fn num_rows(&self) -> u64;
    fn rows_drained(&self) -> u64;
    fn drain(&mut self, num_rows: u64) -> Result<NextDecodeTask>;
    fn data_type(&self) -> &DataType;
}

Import

use lance_encoding::previous::decoder::{
    FieldScheduler, LogicalPageDecoder, SchedulingJob, DecoderReady,
};

I/O Contract

Input Type Description
ranges &[Range<u64>] Row ranges to schedule for reading (ordered, non-overlapping)
filter &FilterExpression Filter expression for predicate pushdown
loaded_need u64 Minimum number of rows that must be loaded before decoding
num_rows u64 Number of rows to drain from the decoder
Output Type Description
scheduling_job Box<dyn SchedulingJob> Job that emits decoders as I/O completes
scan_line ScheduledScanLine Collection of decoder-ready messages with row counts
decode_task NextDecodeTask Task containing the decode operation and row count

Usage Examples

use lance_encoding::previous::decoder::{FieldScheduler, LogicalPageDecoder};

// Given a field scheduler for a column
let scheduler: Arc<dyn FieldScheduler> = /* obtained from file reader */;

// Schedule ranges for reading
let ranges = vec![0..100, 200..300];
let filter = FilterExpression::no_filter();
let mut job = scheduler.schedule_ranges(&ranges, &filter)?;

// Process scheduled scan lines
let scan_line = job.schedule_next(&mut context, &priority)?;
for message in scan_line.decoders {
    let decoder_ready = message.into_legacy();
    let mut decoder = decoder_ready.decoder;

    // Wait for data to load
    decoder.wait_for_loaded(decoder.num_rows()).await?;

    // Drain decoded data
    let task = decoder.drain(decoder.num_rows())?;
    let array = task.task.decode()?;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment