Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb LZ4 Compression

From Leeroopedia


Knowledge Sources
Domains Compression, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

LZ4 is an extremely fast lossless compression algorithm integrated into DuckDB as a third-party library, providing compression speeds exceeding 500 MB/s per core and decompression speeds reaching multiple GB/s per core.

Description

LZ4 is a byte-oriented compression algorithm that operates on blocks of data (as opposed to frames). The DuckDB vendored copy lives in the duckdb_lz4 namespace (version 1.9.4) and provides both simple one-shot functions and advanced streaming/context-reuse APIs. The library supports configurable memory usage (default 14, meaning 16 KB hash table) and an "acceleration" parameter that trades compression ratio for speed. Compression is guaranteed to succeed when the destination buffer is at least LZ4_compressBound(srcSize) bytes. The implementation handles platform-specific memory access optimizations including packed access and direct unaligned access for ARM and Intel architectures.

Key characteristics:

  • Block-level compression -- does not produce self-contained frames (see lz4frame.h for frames)
  • Maximum input size of approximately 2.1 GB (LZ4_MAX_INPUT_SIZE = 0x7E000000)
  • Stack-based state allocation by default (LZ4_HEAPMODE = 0)
  • Acceleration factor from 1 (default, best ratio) to 65537 (fastest)

Usage

DuckDB uses LZ4 for block-level compression within its storage layer and for compressing data in Parquet file I/O. The library is invoked through the duckdb_lz4 namespace wrapper, which isolates it from other LZ4 installations that may be linked into the same process. LZ4 is selected when fast compression and decompression throughput is prioritized over compression ratio.

Code Reference

Source Location

Signature

namespace duckdb_lz4 {

// --- Simple Functions ---

// Compress srcSize bytes from src into dst (max dstCapacity bytes).
// Returns number of bytes written, or 0 on failure.
LZ4LIB_API int LZ4_compress_default(const char* src, char* dst,
                                     int srcSize, int dstCapacity);

// Decompress compressedSize bytes from src into dst (max dstCapacity bytes).
// Returns number of bytes decompressed, or negative on error.
LZ4LIB_API int LZ4_decompress_safe(const char* src, char* dst,
                                    int compressedSize, int dstCapacity);

// --- Advanced Functions ---

// Returns worst-case compressed size for a given input size.
LZ4LIB_API int LZ4_compressBound(int inputSize);

// Compress with selectable acceleration factor (1 = default, higher = faster).
LZ4LIB_API int LZ4_compress_fast(const char* src, char* dst,
                                  int srcSize, int dstCapacity,
                                  int acceleration);

// Compress using an externally allocated state buffer.
LZ4LIB_API int LZ4_sizeofState(void);
LZ4LIB_API int LZ4_compress_fast_extState(void* state, const char* src,
                                           char* dst, int srcSize,
                                           int dstCapacity, int acceleration);

// --- Version ---
LZ4LIB_API int LZ4_versionNumber(void);
LZ4LIB_API const char* LZ4_versionString(void);

} // namespace duckdb_lz4

Import

#include "lz4.hpp"

I/O Contract

Inputs

Name Type Required Description
src const char* Yes Pointer to source data buffer to compress or compressed data to decompress
srcSize / compressedSize int Yes Number of bytes to read from the source buffer
dst char* Yes Pointer to pre-allocated destination buffer
dstCapacity int Yes Size of the destination buffer in bytes
acceleration int No Acceleration factor for LZ4_compress_fast(); 1 = default, higher = faster but lower ratio
state void* No Externally allocated state for LZ4_compress_fast_extState(); must be 8-byte aligned and at least LZ4_sizeofState() bytes

Outputs

Name Type Description
(return) int Number of bytes written into dst on success; 0 on compression failure; negative value on decompression failure
dst buffer char* Contains the compressed or decompressed output data

Usage Examples

#include "lz4.hpp"

// --- Compression Example ---
const char* source = "Hello, DuckDB LZ4 compression!";
int sourceSize = strlen(source);
int maxDestSize = duckdb_lz4::LZ4_compressBound(sourceSize);

char* compressed = new char[maxDestSize];
int compressedSize = duckdb_lz4::LZ4_compress_default(
    source, compressed, sourceSize, maxDestSize);
// compressedSize now holds the number of compressed bytes

// --- Decompression Example ---
char* decompressed = new char[sourceSize];
int decompressedSize = duckdb_lz4::LZ4_decompress_safe(
    compressed, decompressed, compressedSize, sourceSize);
// decompressedSize == sourceSize on success

delete[] compressed;
delete[] decompressed;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment