Implementation:Lance format Lance LegacyFsstEncoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Legacy_Format |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The legacy FSST encoding applies Fast Static Symbol Table compression to string and binary data in the Lance v2.0 format.
Description
⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.
This module implements FSST (Fast Static Symbol Table) page encoding for the legacy (v2.0) Lance file format. FsstPageScheduler wraps an inner page scheduler (typically a binary scheduler) and stores a symbol_table that was learned during encoding. During decoding, FsstPageDecoder first decodes the compressed data via the inner decoder, then decompresses the variable-width string data using the FSST symbol table through fsst::fsst::decompress. The decompressed output includes expanded offsets and bytes buffers. Null values are preserved through the decompression process by wrapping results in NullableDataBlock when nulls are present. The FsstArrayEncoder handles encoding by training a symbol table on the input data, compressing the bytes, then encoding the compressed result with an inner encoder. FSST is automatically selected for Utf8 and Binary data larger than 4 MiB in v2.1+ files, or can be explicitly requested via field metadata.
Usage
Use this encoding for string and binary columns with repetitive patterns where FSST compression provides significant size reduction. The CoreArrayEncodingStrategy automatically selects FSST for Utf8/Binary columns larger than 4 MiB in v2.1+ files. It can also be explicitly enabled via the compression field metadata key set to fsst. During reading, FsstPageScheduler is created by the physical dispatch from Fsst protobuf encoding.
Code Reference
Source Location
rust/lance-encoding/src/previous/encodings/physical/fsst.rs
Signature
pub struct FsstPageScheduler {
inner_scheduler: Box<dyn PageScheduler>,
symbol_table: LanceBuffer,
}
impl FsstPageScheduler {
pub fn new(inner_scheduler: Box<dyn PageScheduler>, symbol_table: LanceBuffer) -> Self;
}
impl PageScheduler for FsstPageScheduler { /* ... */ }
pub struct FsstArrayEncoder {
inner_encoder: Box<dyn ArrayEncoder>,
}
impl FsstArrayEncoder {
pub fn new(inner_encoder: Box<dyn ArrayEncoder>) -> Self;
}
impl ArrayEncoder for FsstArrayEncoder { /* ... */ }
Import
use lance_encoding::previous::encodings::physical::fsst::{
FsstPageScheduler, FsstArrayEncoder,
};
I/O Contract
| Input | Type | Description |
|---|---|---|
| inner_scheduler | Box<dyn PageScheduler> |
Scheduler for the FSST-compressed binary data |
| symbol_table | LanceBuffer |
Learned FSST symbol table for decompression |
| data | DataBlock |
Variable-width or nullable data block to compress |
| Output | Type | Description |
|---|---|---|
| decoded | DataBlock |
Decompressed variable-width data block (with optional nulls) |
| encoded | EncodedArray |
FSST-compressed data with symbol table in encoding descriptor |
Usage Examples
use lance_encoding::previous::encodings::physical::fsst::{FsstPageScheduler, FsstArrayEncoder};
use lance_encoding::buffer::LanceBuffer;
use lance_encoding::decoder::PageScheduler;
// Create an FSST page scheduler with a symbol table
let inner_scheduler: Box<dyn PageScheduler> = /* binary page scheduler */;
let symbol_table: LanceBuffer = /* from protobuf encoding metadata */;
let scheduler = FsstPageScheduler::new(inner_scheduler, symbol_table);
// Create an FSST encoder wrapping a binary encoder
let inner_encoder: Box<dyn ArrayEncoder> = /* binary encoder */;
let encoder = FsstArrayEncoder::new(inner_encoder);
Related Pages
- Lance_format_Lance_LegacyPhysicalDispatch - Creates FsstPageScheduler from protobuf
- Lance_format_Lance_LegacyBinaryEncoding - Inner encoding wrapped by FSST
- Lance_format_Lance_LegacyEncoder - Strategy that selects FSST for large string data
- Lance_format_Lance_LegacyLogicalBinaryEncoding - Logical layer above FSST-compressed binary
- Heuristic:Lance_format_Lance_Warning_Deprecated_Legacy_Encodings