Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyListEncoding

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy list encoding handles variable-length list columns (List and LargeList) by storing offsets and items separately in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements list encoding for the legacy (v2.0) Lance file format. It supports both List and LargeList Arrow data types. The encoding stores list offsets in one column and item data in child columns. The scheduling logic is complex because row ranges can span multiple offset pages, requiring careful handling of null offset adjustments, extra offset reads, and items offset calculations. The ListFieldScheduler manages offset page metadata (OffsetPageInfo) to correctly map row ranges to item ranges. The ListFieldEncoder accumulates arrays and produces encoded pages with offsets and items. Null values are tracked through a null offset adjustment mechanism where offsets exceeding the adjustment threshold indicate null entries.

Usage

Use this encoding for DataType::List and DataType::LargeList fields. The CoreFieldEncodingStrategy automatically creates a ListFieldEncoder for list-typed fields, pairing it with an inner encoder for the items column. During reading, ListFieldScheduler wraps an offset scheduler and items scheduler, performing indirect I/O to decode items after offsets are loaded.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/logical/list.rs

Signature

pub struct ListFieldScheduler {
    offsets_column: Arc<dyn FieldScheduler>,
    items_column: Arc<dyn FieldScheduler>,
    offset_page_infos: Vec<OffsetPageInfo>,
    offsets_type: DataType,
    num_rows: u64,
}

impl ListFieldScheduler {
    pub fn new(
        offsets_column: Arc<dyn FieldScheduler>,
        items_column: Arc<dyn FieldScheduler>,
        offset_page_infos: Vec<OffsetPageInfo>,
        offsets_type: DataType,
        num_rows: u64,
    ) -> Self;
}

pub struct ListFieldEncoder { /* fields omitted */ }

impl ListFieldEncoder {
    pub fn new(
        items_encoder: Box<dyn FieldEncoder>,
        offsets_encoder: Arc<dyn ArrayEncoder>,
        cache_bytes_per_column: u64,
        keep_original_array: bool,
        column_index: u32,
    ) -> Self;
}

Import

use lance_encoding::previous::encodings::logical::list::{
    ListFieldScheduler, ListFieldEncoder, OffsetPageInfo,
};

I/O Contract

Input Type Description
offsets_column Arc<dyn FieldScheduler> Scheduler for the list offsets column
items_column Arc<dyn FieldScheduler> Scheduler for the items column
offset_page_infos Vec<OffsetPageInfo> Per-page metadata including null adjustment and item counts
ranges &[Range<u64>] Row ranges to schedule (must be ordered and non-overlapping)
Output Type Description
decoded ListArray / LargeListArray Reconstructed Arrow list array
encoded_pages Vec<EncodedPage> Encoded offset pages
encoded_items EncodedColumn Encoded item column data

Usage Examples

use lance_encoding::previous::encodings::logical::list::{ListFieldScheduler, OffsetPageInfo};
use lance_encoding::previous::decoder::FieldScheduler;
use arrow_schema::DataType;
use std::sync::Arc;

// Build page info from metadata
let page_infos = vec![OffsetPageInfo {
    offsets_in_page: 2000,
    null_offset_adjustment: 0,
    num_items_referenced_by_page: 5000,
}];

// Create a list field scheduler
let offsets_scheduler: Arc<dyn FieldScheduler> = /* from file metadata */;
let items_scheduler: Arc<dyn FieldScheduler> = /* from file metadata */;
let list_scheduler = ListFieldScheduler::new(
    offsets_scheduler,
    items_scheduler,
    page_infos,
    DataType::Int32,
    10000,
);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment