Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Duckdb Duckdb Integer Bit Packing

From Leeroopedia


Knowledge Sources
Domains Compression, Columnar_Storage, Integer_Encoding
Last Updated 2026-02-07 12:00 GMT

Overview

A columnar integer compression technique that stores values using only the minimum number of bits required, packing multiple values tightly into machine words to reduce storage and improve cache utilization.

Description

Integer bit packing is a compression technique used in columnar databases where integers are stored using exactly the number of bits needed to represent the largest value in a block, rather than a fixed 32-bit or 64-bit representation. For example, if the maximum value in a block of 128 integers is 15, each value can be stored using only 4 bits, achieving 8x compression over 32-bit storage.

The technique is particularly effective for columnar data because values in a column often have similar magnitudes. In analytical workloads, columns like status codes, small foreign keys, dates encoded as offsets, and dictionary-encoded indices frequently use far fewer bits than their declared type would suggest. By analyzing the actual value range within each block, the compressor can select the optimal bit width.

Modern bit-packing implementations are designed to exploit SIMD (Single Instruction, Multiple Data) instructions for parallel packing and unpacking. By processing 128 or 256 bits at once, SIMD-accelerated bit packing can decode billions of integers per second, making the decompression cost negligible compared to memory bandwidth. The FastPFor family of codecs extends basic bit packing with patched coding, where a small number of outlier values that would inflate the bit width are stored separately.

Usage

Integer bit packing is used in DuckDB's storage layer for compressing integer columns. When DuckDB writes columnar data to persistent storage, it analyzes each block of integer values, determines the minimum bit width, and packs the values accordingly. During query execution, the packed integers are unpacked efficiently using vectorized operations that align with DuckDB's vector-at-a-time execution model.

Theoretical Basis

Basic Bit Packing: Pack N integers using B bits each:

// Determine bit width for a block
B = ceil(log2(max_value + 1))  // bits needed
// B=0 for constant blocks, B=32 for full-range int32

// Pack 32 integers into B 32-bit words
function pack(input[32], B) -> output[B]:
    for i in 0..31:
        bit_position = i * B
        word_index = bit_position / 32
        bit_offset = bit_position % 32
        output[word_index] |= (input[i] << bit_offset)
        if bit_offset + B > 32:  // spans word boundary
            output[word_index + 1] |= (input[i] >> (32 - bit_offset))

SIMD-Accelerated Unpacking: Process 128 bits at a time:

// Unpack with SIMD (SSE2/AVX2 pseudo-code)
function unpack_simd(packed, B, count) -> output:
    mask = (1 << B) - 1                    // B-bit mask
    reg = load_128bit(packed)
    for i in 0..count-1:
        value = (reg >> (i * B)) & mask    // extract B bits
        output[i] = value
    // In practice, uses shuffle + shift + AND instructions

Frame of Reference (FOR): Subtract a base value to reduce bit width:

// FOR encoding
base = min(values)
for each value v:
    encoded = v - base              // reduce range
B = ceil(log2(max(encoded) + 1))    // often much smaller
// Store: base (32 bits) + N values at B bits each

// FOR + Bit Packing combined:
// Original: [1000, 1001, 1003, 1002] -> 10 bits each
// After FOR: [0, 1, 3, 2] -> 2 bits each (base=1000)

Patched Coding: Handle outliers without inflating bit width:

// PFor (Patched Frame of Reference)
B = select_bit_width(values, allow_exceptions=10%)
for each value v:
    if v fits in B bits:
        pack normally
    else:
        store in exceptions list (position, value)
// Decode: unpack B-bit values, then patch exceptions

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment