Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LegacyFsstEncoding

From Leeroopedia


Knowledge Sources
Domains Encoding, Legacy_Format
Last Updated 2026-02-08 19:33 GMT

Overview

The legacy FSST encoding applies Fast Static Symbol Table compression to string and binary data in the Lance v2.0 format.

Description

⚠️ DEPRECATED: This is legacy code from the Lance v1/v2.0 format, retained only for backward compatibility. See Lance_format_Lance_Warning_Deprecated_Legacy_Encodings.

This module implements FSST (Fast Static Symbol Table) page encoding for the legacy (v2.0) Lance file format. FsstPageScheduler wraps an inner page scheduler (typically a binary scheduler) and stores a symbol_table that was learned during encoding. During decoding, FsstPageDecoder first decodes the compressed data via the inner decoder, then decompresses the variable-width string data using the FSST symbol table through fsst::fsst::decompress. The decompressed output includes expanded offsets and bytes buffers. Null values are preserved through the decompression process by wrapping results in NullableDataBlock when nulls are present. The FsstArrayEncoder handles encoding by training a symbol table on the input data, compressing the bytes, then encoding the compressed result with an inner encoder. FSST is automatically selected for Utf8 and Binary data larger than 4 MiB in v2.1+ files, or can be explicitly requested via field metadata.

Usage

Use this encoding for string and binary columns with repetitive patterns where FSST compression provides significant size reduction. The CoreArrayEncodingStrategy automatically selects FSST for Utf8/Binary columns larger than 4 MiB in v2.1+ files. It can also be explicitly enabled via the compression field metadata key set to fsst. During reading, FsstPageScheduler is created by the physical dispatch from Fsst protobuf encoding.

Code Reference

Source Location

rust/lance-encoding/src/previous/encodings/physical/fsst.rs

Signature

pub struct FsstPageScheduler {
    inner_scheduler: Box<dyn PageScheduler>,
    symbol_table: LanceBuffer,
}

impl FsstPageScheduler {
    pub fn new(inner_scheduler: Box<dyn PageScheduler>, symbol_table: LanceBuffer) -> Self;
}

impl PageScheduler for FsstPageScheduler { /* ... */ }

pub struct FsstArrayEncoder {
    inner_encoder: Box<dyn ArrayEncoder>,
}

impl FsstArrayEncoder {
    pub fn new(inner_encoder: Box<dyn ArrayEncoder>) -> Self;
}

impl ArrayEncoder for FsstArrayEncoder { /* ... */ }

Import

use lance_encoding::previous::encodings::physical::fsst::{
    FsstPageScheduler, FsstArrayEncoder,
};

I/O Contract

Input Type Description
inner_scheduler Box<dyn PageScheduler> Scheduler for the FSST-compressed binary data
symbol_table LanceBuffer Learned FSST symbol table for decompression
data DataBlock Variable-width or nullable data block to compress
Output Type Description
decoded DataBlock Decompressed variable-width data block (with optional nulls)
encoded EncodedArray FSST-compressed data with symbol table in encoding descriptor

Usage Examples

use lance_encoding::previous::encodings::physical::fsst::{FsstPageScheduler, FsstArrayEncoder};
use lance_encoding::buffer::LanceBuffer;
use lance_encoding::decoder::PageScheduler;

// Create an FSST page scheduler with a symbol table
let inner_scheduler: Box<dyn PageScheduler> = /* binary page scheduler */;
let symbol_table: LanceBuffer = /* from protobuf encoding metadata */;
let scheduler = FsstPageScheduler::new(inner_scheduler, symbol_table);

// Create an FSST encoder wrapping a binary encoder
let inner_encoder: Box<dyn ArrayEncoder> = /* binary encoder */;
let encoder = FsstArrayEncoder::new(inner_encoder);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment