Implementation:Duckdb Duckdb FastPForLib
| Knowledge Sources | |
|---|---|
| Domains | Compression, Integer_Encoding, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
FastPForLib is a high-performance bit-packing library by Daniel Lemire that provides optimized functions for packing and unpacking arrays of integers using a specified bit width, enabling compact storage of integer data in DuckDB's columnar storage engine.
Description
The library lives in the duckdb_fastpforlib namespace and provides two layers of API:
- Low-level internal functions (
bitpacking.h/bitpacking.cpp) -- Individual__fastunpackNand__fastpackNfunctions for each bit width N (0 through the type's maximum), operating on fixed-size batches of integers. These are generated foruint8_t(8 values),uint16_t(16 values),uint32_t(32 values), anduint64_t(32 values packed fromuint32_tstorage).
- Helper dispatch functions (
bitpackinghelpers.h) -- Inline functions that dispatch to the correct bit-width-specific internal function via a switch statement:fastunpack_quarter/fastpack_quarter-- Pack/unpack 8uint8_tvaluesfastunpack_half/fastpack_half-- Pack/unpack 16uint16_tvaluesfastunpack/fastpack-- Pack/unpack 32uint32_tvaluesfastunpack/fastpack(64-bit overloads) -- Pack/unpack 32uint64_tvalues fromuint32_tpacked storage
Each function reads or writes a fixed number of values from a tightly packed bit stream where each integer occupies exactly the specified number of bits.
Usage
DuckDB uses FastPForLib extensively in its storage layer for compressing integer columns. When a column segment contains values that fit within fewer bits than the native integer width, FastPForLib packs them tightly to reduce I/O and memory footprint. This is a core component of DuckDB's lightweight compression scheme for integer and boolean data stored in columnar format.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
Signature
namespace duckdb_fastpforlib {
namespace internal {
// Low-level: unpack/pack for each bit width N (0..8 for uint8, 0..16 for uint16, 0..32 for uint32, 0..64 for uint64)
void __fastunpackN(const uint8_t *__restrict in, uint8_t *__restrict out); // 8 values
void __fastpackN (const uint8_t *__restrict in, uint8_t *__restrict out);
void __fastunpackN(const uint16_t *__restrict in, uint16_t *__restrict out); // 16 values
void __fastpackN (const uint16_t *__restrict in, uint16_t *__restrict out);
void __fastunpackN(const uint32_t *__restrict in, uint32_t *__restrict out); // 32 values
void __fastpackN (const uint32_t *__restrict in, uint32_t *__restrict out);
void __fastunpackN(const uint32_t *__restrict in, uint64_t *__restrict out); // 32 values (64-bit output)
void __fastpackN (const uint64_t *__restrict in, uint32_t *__restrict out);
// Helper dispatch functions (select bit width at runtime)
inline void fastunpack_quarter(const uint8_t *in, uint8_t *out, const uint32_t bit);
inline void fastpack_quarter (const uint8_t *in, uint8_t *out, const uint32_t bit);
inline void fastunpack_half (const uint16_t *in, uint16_t *out, const uint32_t bit);
inline void fastpack_half (const uint16_t *in, uint16_t *out, const uint32_t bit);
inline void fastunpack (const uint32_t *in, uint32_t *out, const uint32_t bit);
inline void fastpack (const uint32_t *in, uint32_t *out, const uint32_t bit);
inline void fastunpack (const uint32_t *in, uint64_t *out, const uint32_t bit);
inline void fastpack (const uint64_t *in, uint32_t *out, const uint32_t bit);
} // namespace internal
} // namespace duckdb_fastpforlib
Import
#include "fastpforlib/bitpackinghelpers.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| in | const uintN_t* __restrict | Yes | Pointer to the packed (for unpack) or unpacked (for pack) input buffer |
| bit | uint32_t | Yes | Bit width per value (0 to max bits for the type); determines packing density |
Outputs
| Name | Type | Description |
|---|---|---|
| out | uintN_t* __restrict | Destination buffer: unpacked values (for unpack) or packed bit stream (for pack) |
The number of values processed per call is fixed by type:
| Type | Values per call | Max bit width |
|---|---|---|
| uint8_t | 8 | 8 |
| uint16_t | 16 | 16 |
| uint32_t | 32 | 32 |
| uint64_t | 32 | 64 |
Usage Examples
#include "fastpforlib/bitpackinghelpers.h"
// Pack 32 uint32_t values using 12 bits each
uint32_t unpacked[32] = { /* values that fit in 12 bits */ };
uint32_t packed[12]; // 32 * 12 / 32 = 12 uint32_t words needed
duckdb_fastpforlib::internal::fastpack(unpacked, packed, 12);
// Unpack them back
uint32_t restored[32];
duckdb_fastpforlib::internal::fastunpack(packed, restored, 12);
// Pack 8 uint8_t values using 5 bits each
uint8_t vals8[8] = {1, 2, 3, 4, 5, 6, 7, 8};
uint8_t packed8[5]; // 8 * 5 / 8 = 5 bytes
duckdb_fastpforlib::internal::fastpack_quarter(vals8, packed8, 5);