Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyStructEncoding

From Leeroopedia
Revision as of 15:28, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lance_format_Lance_LegacyStructEncoding.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy struct encoding handles struct-typed fields by scheduling and decoding child fields in row-major order in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements struct field scheduling and decoding for the legacy (v2.0) Lance file format. SimpleStructScheduler coordinates the scheduling of child field schedulers using a min-heap (BinaryHeap) to prioritize children with the least data scheduled, enabling row-major ordering that allows decoding of complete rows as quickly as possible. This scheduler serves as the starting point for all decoding since the top-level record batch is treated as a non-nullable struct. SimpleStructDecoder collects child decoders and assembles them into a StructArray when drained. The module also provides EmptyStructDecoder and EmptyStructSchedulerJob for handling structs with no child fields. The StructFieldEncoder on the encoding side coordinates child encoders and produces a header column for each struct.

Usage

Use this encoding for DataType::Struct fields that are not marked for packed struct encoding. The CoreFieldEncodingStrategy selects StructFieldEncoder for regular struct fields. During reading, SimpleStructScheduler is the top-level scheduler that drives the entire decoding pipeline. It is also used for nested struct fields within the schema.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/logical/struct.rs

Signature

pub struct SimpleStructScheduler {
    children: Vec<Arc<dyn FieldScheduler>>,
    child_fields: Fields,
    num_rows: u64,
}

impl SimpleStructScheduler {
    pub fn new(
        children: Vec<Arc<dyn FieldScheduler>>,
        child_fields: Fields,
        num_rows: u64,
    ) -> Self;
}

pub struct SimpleStructDecoder { /* fields omitted */ }

impl SimpleStructDecoder {
    pub fn new(child_fields: Fields, num_rows: u64) -> Self;
}

pub struct StructFieldEncoder { /* fields omitted */ }

impl StructFieldEncoder {
    pub fn new(
        children: Vec<Box<dyn FieldEncoder>>,
        header_idx: u32,
    ) -> Self;
}

Import

use lance_encoding::previous::encodings::logical::r#struct::{
    SimpleStructScheduler, SimpleStructDecoder, StructFieldEncoder,
};

I/O Contract

Input Type Description
children Vec<Arc<dyn FieldScheduler>> Child field schedulers for each struct field
child_fields Fields Arrow field descriptors for children
num_rows u64 Total number of rows in the struct
ranges &[Range<u64>] Row ranges to schedule
Output Type Description
decoded StructArray Reconstructed Arrow struct array from child arrays
scan_line ScheduledScanLine Decoders for children, emitted in row-major order

Usage Examples

use lance_encoding::previous::encodings::logical::r#struct::SimpleStructScheduler;
use lance_encoding::previous::decoder::FieldScheduler;
use arrow_schema::{DataType, Field, Fields};
use std::sync::Arc;

// Build child schedulers from file metadata
let child_schedulers: Vec<Arc<dyn FieldScheduler>> = /* from columns */;
let child_fields = Fields::from(vec![
    Field::new("x", DataType::Float64, false),
    Field::new("y", DataType::Float64, false),
]);

// Create the top-level struct scheduler
let scheduler = SimpleStructScheduler::new(
    child_schedulers,
    child_fields,
    10000,
);

// Schedule ranges for reading
let ranges = vec![0..1000];
let filter = FilterExpression::no_filter();
let mut job = scheduler.schedule_ranges(&ranges, &filter)?;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment