Implementation:Duckdb Duckdb LZ4 Compression
| Knowledge Sources | |
|---|---|
| Domains | Compression, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
LZ4 is an extremely fast lossless compression algorithm integrated into DuckDB as a third-party library, providing compression speeds exceeding 500 MB/s per core and decompression speeds reaching multiple GB/s per core.
Description
LZ4 is a byte-oriented compression algorithm that operates on blocks of data (as opposed to frames). The DuckDB vendored copy lives in the duckdb_lz4 namespace (version 1.9.4) and provides both simple one-shot functions and advanced streaming/context-reuse APIs. The library supports configurable memory usage (default 14, meaning 16 KB hash table) and an "acceleration" parameter that trades compression ratio for speed. Compression is guaranteed to succeed when the destination buffer is at least LZ4_compressBound(srcSize) bytes. The implementation handles platform-specific memory access optimizations including packed access and direct unaligned access for ARM and Intel architectures.
Key characteristics:
- Block-level compression -- does not produce self-contained frames (see
lz4frame.hfor frames) - Maximum input size of approximately 2.1 GB (
LZ4_MAX_INPUT_SIZE = 0x7E000000) - Stack-based state allocation by default (
LZ4_HEAPMODE = 0) - Acceleration factor from 1 (default, best ratio) to 65537 (fastest)
Usage
DuckDB uses LZ4 for block-level compression within its storage layer and for compressing data in Parquet file I/O. The library is invoked through the duckdb_lz4 namespace wrapper, which isolates it from other LZ4 installations that may be linked into the same process. LZ4 is selected when fast compression and decompression throughput is prioritized over compression ratio.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
- third_party/lz4/lz4.hpp (843 lines) -- LZ4 API header
- third_party/lz4/lz4.cpp (2605 lines) -- LZ4 compression/decompression implementation
Signature
namespace duckdb_lz4 {
// --- Simple Functions ---
// Compress srcSize bytes from src into dst (max dstCapacity bytes).
// Returns number of bytes written, or 0 on failure.
LZ4LIB_API int LZ4_compress_default(const char* src, char* dst,
int srcSize, int dstCapacity);
// Decompress compressedSize bytes from src into dst (max dstCapacity bytes).
// Returns number of bytes decompressed, or negative on error.
LZ4LIB_API int LZ4_decompress_safe(const char* src, char* dst,
int compressedSize, int dstCapacity);
// --- Advanced Functions ---
// Returns worst-case compressed size for a given input size.
LZ4LIB_API int LZ4_compressBound(int inputSize);
// Compress with selectable acceleration factor (1 = default, higher = faster).
LZ4LIB_API int LZ4_compress_fast(const char* src, char* dst,
int srcSize, int dstCapacity,
int acceleration);
// Compress using an externally allocated state buffer.
LZ4LIB_API int LZ4_sizeofState(void);
LZ4LIB_API int LZ4_compress_fast_extState(void* state, const char* src,
char* dst, int srcSize,
int dstCapacity, int acceleration);
// --- Version ---
LZ4LIB_API int LZ4_versionNumber(void);
LZ4LIB_API const char* LZ4_versionString(void);
} // namespace duckdb_lz4
Import
#include "lz4.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| src | const char* |
Yes | Pointer to source data buffer to compress or compressed data to decompress |
| srcSize / compressedSize | int |
Yes | Number of bytes to read from the source buffer |
| dst | char* |
Yes | Pointer to pre-allocated destination buffer |
| dstCapacity | int |
Yes | Size of the destination buffer in bytes |
| acceleration | int |
No | Acceleration factor for LZ4_compress_fast(); 1 = default, higher = faster but lower ratio
|
| state | void* |
No | Externally allocated state for LZ4_compress_fast_extState(); must be 8-byte aligned and at least LZ4_sizeofState() bytes
|
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | int |
Number of bytes written into dst on success; 0 on compression failure; negative value on decompression failure
|
| dst buffer | char* |
Contains the compressed or decompressed output data |
Usage Examples
#include "lz4.hpp"
// --- Compression Example ---
const char* source = "Hello, DuckDB LZ4 compression!";
int sourceSize = strlen(source);
int maxDestSize = duckdb_lz4::LZ4_compressBound(sourceSize);
char* compressed = new char[maxDestSize];
int compressedSize = duckdb_lz4::LZ4_compress_default(
source, compressed, sourceSize, maxDestSize);
// compressedSize now holds the number of compressed bytes
// --- Decompression Example ---
char* decompressed = new char[sourceSize];
int decompressedSize = duckdb_lz4::LZ4_decompress_safe(
compressed, decompressed, compressedSize, sourceSize);
// decompressedSize == sourceSize on success
delete[] compressed;
delete[] decompressed;