Implementation:Lance format Lance LegacyListEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy list encoding handles variable-length list columns (List and LargeList) by storing offsets and items separately in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements list encoding for the legacy (v2.0) Lance file format. It supports both List and LargeList Arrow data types. The encoding stores list offsets in one column and item data in child columns. The scheduling logic is complex because row ranges can span multiple offset pages, requiring careful handling of null offset adjustments, extra offset reads, and items offset calculations. The ListFieldScheduler manages offset page metadata (OffsetPageInfo) to correctly map row ranges to item ranges. The ListFieldEncoder accumulates arrays and produces encoded pages with offsets and items. Null values are tracked through a null offset adjustment mechanism where offsets exceeding the adjustment threshold indicate null entries.
Usage
Use this encoding for DataType::List and DataType::LargeList fields. The CoreFieldEncodingStrategy automatically creates a ListFieldEncoder for list-typed fields, pairing it with an inner encoder for the items column. During reading, ListFieldScheduler wraps an offset scheduler and items scheduler, performing indirect I/O to decode items after offsets are loaded.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/logical/list.rs
Signature
pub struct ListFieldScheduler {
offsets_column: Arc<dyn FieldScheduler>,
items_column: Arc<dyn FieldScheduler>,
offset_page_infos: Vec<OffsetPageInfo>,
offsets_type: DataType,
num_rows: u64,
}
impl ListFieldScheduler {
pub fn new(
offsets_column: Arc<dyn FieldScheduler>,
items_column: Arc<dyn FieldScheduler>,
offset_page_infos: Vec<OffsetPageInfo>,
offsets_type: DataType,
num_rows: u64,
) -> Self;
}
pub struct ListFieldEncoder { /* fields omitted */ }
impl ListFieldEncoder {
pub fn new(
items_encoder: Box<dyn FieldEncoder>,
offsets_encoder: Arc<dyn ArrayEncoder>,
cache_bytes_per_column: u64,
keep_original_array: bool,
column_index: u32,
) -> Self;
}
Import
use lance_encoding::previous::encodings::logical::list::{
ListFieldScheduler, ListFieldEncoder, OffsetPageInfo,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| offsets_column | Arc<dyn FieldScheduler> |
Scheduler for the list offsets column |
| items_column | Arc<dyn FieldScheduler> |
Scheduler for the items column |
| offset_page_infos | Vec<OffsetPageInfo> |
Per-page metadata including null adjustment and item counts |
| ranges | &[Range<u64>] |
Row ranges to schedule (must be ordered and non-overlapping) |
| Output | Type | Description |
|---|---|---|
| decoded | ListArray / LargeListArray |
Reconstructed Arrow list array |
| encoded_pages | Vec<EncodedPage> |
Encoded offset pages |
| encoded_items | EncodedColumn |
Encoded item column data |
Usage Examples
use lance_encoding::previous::encodings::logical::list::{ListFieldScheduler, OffsetPageInfo};
use lance_encoding::previous::decoder::FieldScheduler;
use arrow_schema::DataType;
use std::sync::Arc;
// Build page info from metadata
let page_infos = vec![OffsetPageInfo {
offsets_in_page: 2000,
null_offset_adjustment: 0,
num_items_referenced_by_page: 5000,
}];
// Create a list field scheduler
let offsets_scheduler: Arc<dyn FieldScheduler> = /* from file metadata */;
let items_scheduler: Arc<dyn FieldScheduler> = /* from file metadata */;
let list_scheduler = ListFieldScheduler::new(
offsets_scheduler,
items_scheduler,
page_infos,
DataType::Int32,
10000,
);
Related Pages
- Lance_format_Lance_LegacyEncoder - Strategy that creates list encoders
- Lance_format_Lance_LegacyDecoder - Base decoder traits
- Lance_format_Lance_LegacyPrimitiveEncoding - Used for encoding offsets
- Lance_format_Lance_LegacyLogicalBinaryEncoding - Binary data built on top of list encoding
- Lance_format_Lance_LegacyBasicEncoding - Encodes the offsets buffers
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings