Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb FastPForLib

From Leeroopedia


Knowledge Sources
Domains Compression, Integer_Encoding, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

FastPForLib is a high-performance bit-packing library by Daniel Lemire that provides optimized functions for packing and unpacking arrays of integers using a specified bit width, enabling compact storage of integer data in DuckDB's columnar storage engine.

Description

The library lives in the duckdb_fastpforlib namespace and provides two layers of API:

  • Low-level internal functions (bitpacking.h / bitpacking.cpp) -- Individual __fastunpackN and __fastpackN functions for each bit width N (0 through the type's maximum), operating on fixed-size batches of integers. These are generated for uint8_t (8 values), uint16_t (16 values), uint32_t (32 values), and uint64_t (32 values packed from uint32_t storage).
  • Helper dispatch functions (bitpackinghelpers.h) -- Inline functions that dispatch to the correct bit-width-specific internal function via a switch statement:
    • fastunpack_quarter / fastpack_quarter -- Pack/unpack 8 uint8_t values
    • fastunpack_half / fastpack_half -- Pack/unpack 16 uint16_t values
    • fastunpack / fastpack -- Pack/unpack 32 uint32_t values
    • fastunpack / fastpack (64-bit overloads) -- Pack/unpack 32 uint64_t values from uint32_t packed storage

Each function reads or writes a fixed number of values from a tightly packed bit stream where each integer occupies exactly the specified number of bits.

Usage

DuckDB uses FastPForLib extensively in its storage layer for compressing integer columns. When a column segment contains values that fit within fewer bits than the native integer width, FastPForLib packs them tightly to reduce I/O and memory footprint. This is a core component of DuckDB's lightweight compression scheme for integer and boolean data stored in columnar format.

Code Reference

Source Location

Signature

namespace duckdb_fastpforlib {
namespace internal {

// Low-level: unpack/pack for each bit width N (0..8 for uint8, 0..16 for uint16, 0..32 for uint32, 0..64 for uint64)
void __fastunpackN(const uint8_t  *__restrict in, uint8_t  *__restrict out);   // 8 values
void __fastpackN (const uint8_t  *__restrict in, uint8_t  *__restrict out);
void __fastunpackN(const uint16_t *__restrict in, uint16_t *__restrict out);   // 16 values
void __fastpackN (const uint16_t *__restrict in, uint16_t *__restrict out);
void __fastunpackN(const uint32_t *__restrict in, uint32_t *__restrict out);   // 32 values
void __fastpackN (const uint32_t *__restrict in, uint32_t *__restrict out);
void __fastunpackN(const uint32_t *__restrict in, uint64_t *__restrict out);   // 32 values (64-bit output)
void __fastpackN (const uint64_t *__restrict in, uint32_t *__restrict out);

// Helper dispatch functions (select bit width at runtime)
inline void fastunpack_quarter(const uint8_t  *in, uint8_t  *out, const uint32_t bit);
inline void fastpack_quarter  (const uint8_t  *in, uint8_t  *out, const uint32_t bit);
inline void fastunpack_half   (const uint16_t *in, uint16_t *out, const uint32_t bit);
inline void fastpack_half     (const uint16_t *in, uint16_t *out, const uint32_t bit);
inline void fastunpack        (const uint32_t *in, uint32_t *out, const uint32_t bit);
inline void fastpack          (const uint32_t *in, uint32_t *out, const uint32_t bit);
inline void fastunpack        (const uint32_t *in, uint64_t *out, const uint32_t bit);
inline void fastpack          (const uint64_t *in, uint32_t *out, const uint32_t bit);

} // namespace internal
} // namespace duckdb_fastpforlib

Import

#include "fastpforlib/bitpackinghelpers.h"

I/O Contract

Inputs

Name Type Required Description
in const uintN_t* __restrict Yes Pointer to the packed (for unpack) or unpacked (for pack) input buffer
bit uint32_t Yes Bit width per value (0 to max bits for the type); determines packing density

Outputs

Name Type Description
out uintN_t* __restrict Destination buffer: unpacked values (for unpack) or packed bit stream (for pack)

The number of values processed per call is fixed by type:

Type Values per call Max bit width
uint8_t 8 8
uint16_t 16 16
uint32_t 32 32
uint64_t 32 64

Usage Examples

#include "fastpforlib/bitpackinghelpers.h"

// Pack 32 uint32_t values using 12 bits each
uint32_t unpacked[32] = { /* values that fit in 12 bits */ };
uint32_t packed[12];  // 32 * 12 / 32 = 12 uint32_t words needed
duckdb_fastpforlib::internal::fastpack(unpacked, packed, 12);

// Unpack them back
uint32_t restored[32];
duckdb_fastpforlib::internal::fastunpack(packed, restored, 12);

// Pack 8 uint8_t values using 5 bits each
uint8_t vals8[8] = {1, 2, 3, 4, 5, 6, 7, 8};
uint8_t packed8[5];  // 8 * 5 / 8 = 5 bytes
duckdb_fastpforlib::internal::fastpack_quarter(vals8, packed8, 5);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment