Implementation:Lance format Lance LegacyBlobEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy blob encoding handles large binary data (1 MiB+) by storing it out-of-line to keep metadata compact in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements the blob encoding for the legacy (v2.0) Lance file format. Large binary data is inefficient to store as regular primitives because it results in approximately one page per row, creating significant metadata overhead. The BlobFieldScheduler and BlobFieldDecoder read blob data by first decoding a descriptions column (containing positions and sizes as a struct of two UInt64 fields), then performing indirect I/O to fetch the actual binary data from out-of-line storage. The BlobFieldEncoder writes blob data by extracting position and size metadata into a description column and storing the actual bytes as out-of-line buffers. This encoding trades random access capability for reduced metadata overhead.
Usage
Use this encoding for fields marked with the blob metadata key (BLOB_META_KEY) when the data contains large binary values. The CoreFieldEncodingStrategy automatically selects this encoding for primitive fields with blob metadata. During reading, BlobFieldScheduler wraps an inner descriptions scheduler and performs follow-up I/O based on the decoded positions and sizes.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/logical/blob.rs
Signature
pub struct BlobFieldScheduler {
descriptions_scheduler: Arc<dyn FieldScheduler>,
}
impl BlobFieldScheduler {
pub fn new(descriptions_scheduler: Arc<dyn FieldScheduler>) -> Self;
}
impl FieldScheduler for BlobFieldScheduler { /* ... */ }
pub struct BlobFieldEncoder { /* fields omitted */ }
impl BlobFieldEncoder {
pub fn new(descriptions_encoder: Box<dyn FieldEncoder>) -> Self;
}
impl FieldEncoder for BlobFieldEncoder { /* ... */ }
Import
use lance_encoding::previous::encodings::logical::blob::{
BlobFieldScheduler, BlobFieldEncoder,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| descriptions_scheduler | Arc<dyn FieldScheduler> |
Scheduler for the blob description column (position + size) |
| ranges | &[Range<u64>] |
Row ranges to read |
| arrays | ArrayRef |
Large binary arrays to encode |
| Output | Type | Description |
|---|---|---|
| decoded | LargeBinaryArray |
Reconstructed large binary array with null support |
| encoded_pages | EncodedPage |
Encoded description column pages |
| out_of_line_buffers | OutOfLineBuffers |
Binary data stored out-of-line |
Usage Examples
use lance_encoding::previous::encodings::logical::blob::BlobFieldScheduler;
use lance_encoding::previous::decoder::FieldScheduler;
use std::sync::Arc;
// Create a blob field scheduler wrapping the descriptions scheduler
let descriptions_scheduler: Arc<dyn FieldScheduler> = /* from column metadata */;
let blob_scheduler = BlobFieldScheduler::new(descriptions_scheduler);
// Schedule row ranges for reading
let ranges = vec![0..50];
let filter = FilterExpression::no_filter();
let mut job = blob_scheduler.schedule_ranges(&ranges, &filter)?;
Related Pages
- Lance_format_Lance_LegacyEncoder - Encoding strategy that selects blob encoding
- Lance_format_Lance_LegacyPrimitiveEncoding - Encoder used for the descriptions column
- Lance_format_Lance_LegacyDecoder - Base decoder traits
- Lance_format_Lance_LegacyStructEncoding - Struct encoding used for description fields
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings