Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance CoreEncoder

From Leeroopedia


Knowledge Sources
Domains Encoding, Columnar_Data
Last Updated 2026-02-08 19:33 GMT

Overview

The CoreEncoder module defines the top-level encoding traits and strategies (FieldEncoder, FieldEncodingStrategy, StructuralEncodingStrategy) for converting Arrow arrays into Lance's encoded page format, supporting both legacy (2.0) and structural (2.1+) encoding approaches.

Description

Lance file encoding is driven by a FieldEncodingStrategy that chooses which encoder to use for each field in the schema. The current default for 2.1+ files is StructuralEncodingStrategy, which builds a tree of encoders that mirror the structure of the data:

  • Struct encoders strip off validity and delegate to child encoders
  • List encoders strip off offsets and delegate to child encoders
  • Primitive leaf encoders accumulate validity, offsets, and values, then use miniblock or fullzip encoding to create pages

Key Types:

  • FieldEncoder (trait) -- Buffers incoming Arrow arrays and emits EncodeTask futures when enough data accumulates for a page. A single field may map to multiple output columns (e.g., struct fields).
  • FieldEncodingStrategy (trait) -- Factory for creating FieldEncoder instances based on field type and metadata.
  • StructuralEncodingStrategy -- The default strategy for 2.1+ files. Delegates compression to a CompressionStrategy.
  • EncodedPage -- A page of encoded data containing buffers, encoding description, row count, and column index.
  • EncodedColumn -- Column-level result containing column buffers, encoding metadata, and all pages.
  • OutOfLineBuffers -- Tracks buffer positions for data stored outside of pages (e.g., large binary encoding).
  • EncodingOptions -- Controls cache size per column (default 8 MiB), max page size (default 32 MiB), buffer alignment (default 64 bytes), and file version.

Encoding Flow:

  1. FieldEncodingStrategy::create_field_encoder creates an encoder for a schema field
  2. For each batch of data, FieldEncoder::maybe_encode buffers the data and may return encode tasks
  3. When enough data is buffered, tasks are spawned to produce EncodedPage instances
  4. FieldEncoder::flush emits remaining pages
  5. FieldEncoder::finish returns final column metadata

Usage

Use this module when:

  • Writing Lance files (the writer calls into the encoding strategy to encode each field)
  • Implementing a custom encoding strategy for specialized data types
  • Configuring encoding parameters (page size, compression, buffer alignment)

Code Reference

Source Location rust/lance-encoding/src/encoder.rs
Key Traits FieldEncoder, FieldEncodingStrategy
Key Structs StructuralEncodingStrategy, EncodedPage, EncodedColumn, EncodingOptions, OutOfLineBuffers
Key Functions default_encoding_strategy(version), default_encoding_strategy_with_params(version, params)
Import use lance_encoding::encoder::{FieldEncoder, EncodingOptions, default_encoding_strategy};

I/O Contract

FieldEncoder Trait Methods:

Method Input Output Description
maybe_encode ArrayRef, &mut OutOfLineBuffers, RepDefBuilder, u64, u64 Result<Vec<EncodeTask>> Buffer data and optionally produce page encode tasks
flush &mut OutOfLineBuffers Result<Vec<EncodeTask>> Flush remaining buffered data into pages
finish &mut OutOfLineBuffers BoxFuture<Result<Vec<EncodedColumn>>> Finalize and return column metadata
num_columns -- u32 Number of output columns this field produces

EncodedPage Fields:

Field Type Description
data Vec<LanceBuffer> The encoded page buffers
description PageEncoding Encoding metadata for decoding
num_rows u64 Number of rows in the page
row_number u64 Top-level row number of the first row
column_idx u32 Column index in the file

EncodingOptions Fields:

Field Type Default Description
cache_bytes_per_column u64 8 MiB Bytes to buffer before writing a page
max_page_bytes u64 32 MiB Maximum page size before splitting
keep_original_array bool true Whether to deep-copy arrays before caching
buffer_alignment u64 64 Alignment for page buffers
version LanceFileVersion default Target file format version

Usage Examples

use lance_encoding::encoder::{
    default_encoding_strategy, ColumnIndexSequence, EncodingOptions,
};
use lance_encoding::version::LanceFileVersion;
use lance_core::datatypes::Field;

// Create encoding strategy for the latest version
let version = LanceFileVersion::default();
let strategy = default_encoding_strategy(version);

// Create an encoder for a specific field
let lance_field = Field::try_from(&arrow_field).unwrap();
let mut col_idx = ColumnIndexSequence::default();
let options = EncodingOptions::default();

let encoder = strategy.create_field_encoder(
    strategy.as_ref(),
    &lance_field,
    &mut col_idx,
    &options,
).unwrap();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment