Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyBlobEncoding

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy blob encoding handles large binary data (1 MiB+) by storing it out-of-line to keep metadata compact in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements the blob encoding for the legacy (v2.0) Lance file format. Large binary data is inefficient to store as regular primitives because it results in approximately one page per row, creating significant metadata overhead. The BlobFieldScheduler and BlobFieldDecoder read blob data by first decoding a descriptions column (containing positions and sizes as a struct of two UInt64 fields), then performing indirect I/O to fetch the actual binary data from out-of-line storage. The BlobFieldEncoder writes blob data by extracting position and size metadata into a description column and storing the actual bytes as out-of-line buffers. This encoding trades random access capability for reduced metadata overhead.

Usage

Use this encoding for fields marked with the blob metadata key (BLOB_META_KEY) when the data contains large binary values. The CoreFieldEncodingStrategy automatically selects this encoding for primitive fields with blob metadata. During reading, BlobFieldScheduler wraps an inner descriptions scheduler and performs follow-up I/O based on the decoded positions and sizes.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/logical/blob.rs

Signature

pub struct BlobFieldScheduler {
    descriptions_scheduler: Arc<dyn FieldScheduler>,
}

impl BlobFieldScheduler {
    pub fn new(descriptions_scheduler: Arc<dyn FieldScheduler>) -> Self;
}

impl FieldScheduler for BlobFieldScheduler { /* ... */ }

pub struct BlobFieldEncoder { /* fields omitted */ }

impl BlobFieldEncoder {
    pub fn new(descriptions_encoder: Box<dyn FieldEncoder>) -> Self;
}

impl FieldEncoder for BlobFieldEncoder { /* ... */ }

Import

use lance_encoding::previous::encodings::logical::blob::{
    BlobFieldScheduler, BlobFieldEncoder,
};

I/O Contract

Input Type Description
descriptions_scheduler Arc<dyn FieldScheduler> Scheduler for the blob description column (position + size)
ranges &[Range<u64>] Row ranges to read
arrays ArrayRef Large binary arrays to encode
Output Type Description
decoded LargeBinaryArray Reconstructed large binary array with null support
encoded_pages EncodedPage Encoded description column pages
out_of_line_buffers OutOfLineBuffers Binary data stored out-of-line

Usage Examples

use lance_encoding::previous::encodings::logical::blob::BlobFieldScheduler;
use lance_encoding::previous::decoder::FieldScheduler;
use std::sync::Arc;

// Create a blob field scheduler wrapping the descriptions scheduler
let descriptions_scheduler: Arc<dyn FieldScheduler> = /* from column metadata */;
let blob_scheduler = BlobFieldScheduler::new(descriptions_scheduler);

// Schedule row ranges for reading
let ranges = vec![0..50];
let filter = FilterExpression::no_filter();
let mut job = blob_scheduler.schedule_ranges(&ranges, &filter)?;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment