Implementation:Lance format Lance BytePack
| Knowledge Sources | |
|---|---|
| Domains | Encoding, Infrastructure |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
The BytePack module provides byte-level integer packing utilities for the Lance encoding pipeline. Unlike bit-packing, byte-packing fits values into the smallest standard integer size (1, 2, 4, or 8 bytes) based on the maximum value encountered. This approach prioritizes simplicity and speed over maximum compression.
The module provides two main types:
BytepackedIntegerEncoder -- An encoder that automatically selects the narrowest integer width:
Zero-- All values are zero; no data storedU8-- Values fit in 1 byte (max <= 255)U16-- Values fit in 2 bytes, little-endian (max <= 65535)U32-- Values fit in 4 bytes, little-endian (max <= 4294967295)U64-- Full 8-byte little-endian encoding
The encoder's append method is marked unsafe because it does not check for overflow; values exceeding the chosen width are silently truncated.
ByteUnpacker -- A generic iterator that decodes byte-packed data back into u64 values. It is parameterized by the byte width (1, 2, 4, or 8) and reads the appropriate number of bytes per value from the underlying byte iterator.
Internal packer structs (U8BytePacker, U16BytePacker, U32BytePacker, U64BytePacker) handle the per-width encoding logic.
Usage
BytePack is used in the Lance encoding pipeline for efficiently storing offset arrays, indices, and other integer sequences where values are typically small but occasionally require the full u64 range.
Code Reference
Source Location
rust/lance-encoding/src/utils/bytepack.rs
Signature
pub enum BytepackedIntegerEncoder {
U8(U8BytePacker),
U16(U16BytePacker),
U32(U32BytePacker),
U64(U64BytePacker),
Zero,
}
impl BytepackedIntegerEncoder {
pub fn with_capacity(capacity: usize, max_value: u64) -> Self;
pub unsafe fn append(&mut self, value: u64);
pub fn into_data(self) -> Vec<u8>;
}
pub enum ByteUnpacker<I: Iterator<Item = u8>> {
U8(I),
U16(I),
U32(I),
U64(I),
}
impl<T: Iterator<Item = u8>> ByteUnpacker<T> {
pub fn new<I: IntoIterator<IntoIter = T>>(data: I, size: usize) -> impl Iterator<Item = u64>;
}
Import
use lance_encoding::utils::bytepack::{BytepackedIntegerEncoder, ByteUnpacker};
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| capacity | usize |
Expected number of values to encode (pre-allocates buffer) |
| max_value | u64 |
Maximum value that will be encoded; determines the byte width |
| value | u64 |
A value to append (must not exceed max_value; unchecked)
|
| data | I: IntoIterator<IntoIter = T> |
Byte source for decoding |
| size | usize |
Byte width per value (1, 2, 4, or 8) |
Outputs
| Type | Description |
|---|---|
Vec<u8> |
Packed byte buffer from into_data()
|
impl Iterator<Item = u64> |
Iterator yielding decoded u64 values from ByteUnpacker::new()
|
Usage Examples
use lance_encoding::utils::bytepack::{BytepackedIntegerEncoder, ByteUnpacker};
// Encode values that fit in u16
let mut encoder = BytepackedIntegerEncoder::with_capacity(3, 1000);
unsafe {
encoder.append(500);
encoder.append(200);
encoder.append(300);
}
let data = encoder.into_data();
// data = [244, 1, 200, 0, 44, 1] (little-endian u16)
// Decode back to u64
let values: Vec<u64> = ByteUnpacker::new(data, 2).collect();
assert_eq!(values, vec![500, 200, 300]);
Related Pages
- Lance_format_Lance_AccumulationQueue -- Another encoding utility in the same module