Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance BytePack

From Leeroopedia


Knowledge Sources
Domains Encoding, Infrastructure
Last Updated 2026-02-08 19:33 GMT

Overview

Description

The BytePack module provides byte-level integer packing utilities for the Lance encoding pipeline. Unlike bit-packing, byte-packing fits values into the smallest standard integer size (1, 2, 4, or 8 bytes) based on the maximum value encountered. This approach prioritizes simplicity and speed over maximum compression.

The module provides two main types:

BytepackedIntegerEncoder -- An encoder that automatically selects the narrowest integer width:

  • Zero -- All values are zero; no data stored
  • U8 -- Values fit in 1 byte (max <= 255)
  • U16 -- Values fit in 2 bytes, little-endian (max <= 65535)
  • U32 -- Values fit in 4 bytes, little-endian (max <= 4294967295)
  • U64 -- Full 8-byte little-endian encoding

The encoder's append method is marked unsafe because it does not check for overflow; values exceeding the chosen width are silently truncated.

ByteUnpacker -- A generic iterator that decodes byte-packed data back into u64 values. It is parameterized by the byte width (1, 2, 4, or 8) and reads the appropriate number of bytes per value from the underlying byte iterator.

Internal packer structs (U8BytePacker, U16BytePacker, U32BytePacker, U64BytePacker) handle the per-width encoding logic.

Usage

BytePack is used in the Lance encoding pipeline for efficiently storing offset arrays, indices, and other integer sequences where values are typically small but occasionally require the full u64 range.

Code Reference

Source Location

rust/lance-encoding/src/utils/bytepack.rs

Signature

pub enum BytepackedIntegerEncoder {
    U8(U8BytePacker),
    U16(U16BytePacker),
    U32(U32BytePacker),
    U64(U64BytePacker),
    Zero,
}

impl BytepackedIntegerEncoder {
    pub fn with_capacity(capacity: usize, max_value: u64) -> Self;
    pub unsafe fn append(&mut self, value: u64);
    pub fn into_data(self) -> Vec<u8>;
}

pub enum ByteUnpacker<I: Iterator<Item = u8>> {
    U8(I),
    U16(I),
    U32(I),
    U64(I),
}

impl<T: Iterator<Item = u8>> ByteUnpacker<T> {
    pub fn new<I: IntoIterator<IntoIter = T>>(data: I, size: usize) -> impl Iterator<Item = u64>;
}

Import

use lance_encoding::utils::bytepack::{BytepackedIntegerEncoder, ByteUnpacker};

I/O Contract

Inputs

Parameter Type Description
capacity usize Expected number of values to encode (pre-allocates buffer)
max_value u64 Maximum value that will be encoded; determines the byte width
value u64 A value to append (must not exceed max_value; unchecked)
data I: IntoIterator<IntoIter = T> Byte source for decoding
size usize Byte width per value (1, 2, 4, or 8)

Outputs

Type Description
Vec<u8> Packed byte buffer from into_data()
impl Iterator<Item = u64> Iterator yielding decoded u64 values from ByteUnpacker::new()

Usage Examples

use lance_encoding::utils::bytepack::{BytepackedIntegerEncoder, ByteUnpacker};

// Encode values that fit in u16
let mut encoder = BytepackedIntegerEncoder::with_capacity(3, 1000);
unsafe {
    encoder.append(500);
    encoder.append(200);
    encoder.append(300);
}
let data = encoder.into_data();
// data = [244, 1, 200, 0, 44, 1] (little-endian u16)

// Decode back to u64
let values: Vec<u64> = ByteUnpacker::new(data, 2).collect();
assert_eq!(values, vec![500, 200, 300]);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment