Implementation:Duckdb Duckdb Zstd Compressor
| Knowledge Sources | |
|---|---|
| Domains | Compression, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Zstd Compressor is the compression side of the Zstandard (zstd) library (version 1.5.6), integrated into DuckDB under the duckdb_zstd namespace, providing configurable compression levels from fast (negative levels) to ultra (level 22).
Description
The Zstd compressor in DuckDB is a comprehensive implementation consisting of multiple specialized modules that together form a high-performance compression pipeline:
- zstd_compress.cpp (7147 lines) -- main compressor entry point implementing
ZSTD_compress(),ZSTD_compressCCtx(), context management, and the one-pass compression API - zstd_fast.cpp (972 lines) -- fast match finder for lower compression levels
- zstd_double_fast.cpp (774 lines) -- double-hash match finder for medium compression levels
- zstd_lazy.cpp (2203 lines) -- lazy/greedy/btlazy2 match finders for higher compression levels
- zstd_opt.cpp (1580 lines) -- optimal parser for maximum compression ratio (levels 16+)
- zstd_ldm.cpp (735 lines) -- long distance matching for improved compression on large windows
- fse_compress.cpp (628 lines) -- Finite State Entropy encoder for sequence encoding
- huf_compress.cpp (1467 lines) -- Huffman encoder for literal compression
- zstd_compress_sequences.cpp (446 lines) -- sequence encoding for match/literal lengths and offsets
- zstd_compress_superblock.cpp (692 lines) -- superblock compression for streaming mode
- zstdmt_compress.cpp (1885 lines) -- multi-threaded compression with thread pool and job queue
The compressor uses stack-based state allocation by default (ZSTD_COMPRESS_HEAPMODE = 0) and supports a 3-byte hash table with configurable maximum log size (ZSTD_HASHLOG3_MAX = 17, i.e., 128Ki positions).
Usage
DuckDB uses the Zstd compressor for storage-layer compression when higher compression ratios are needed compared to LZ4, and for Parquet file output with Zstd codec. The compression context (ZSTD_CCtx) can be reused across multiple compression operations to reduce allocation overhead.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
- third_party/zstd/compress/zstd_compress.cpp (7147 lines) -- main compressor
- third_party/zstd/compress/zstd_fast.cpp (972 lines) -- fast match finder
- third_party/zstd/compress/zstd_double_fast.cpp (774 lines) -- double-hash match finder
- third_party/zstd/compress/zstd_lazy.cpp (2203 lines) -- lazy/greedy match finder
- third_party/zstd/compress/zstd_opt.cpp (1580 lines) -- optimal parser
- third_party/zstd/compress/zstd_ldm.cpp (735 lines) -- long distance matcher
- third_party/zstd/compress/fse_compress.cpp (628 lines) -- FSE encoder
- third_party/zstd/compress/huf_compress.cpp (1467 lines) -- Huffman encoder
- third_party/zstd/compress/zstd_compress_sequences.cpp (446 lines) -- sequence encoder
- third_party/zstd/compress/zstd_compress_superblock.cpp (692 lines) -- superblock compression
- third_party/zstd/compress/zstdmt_compress.cpp (1885 lines) -- multi-threaded compressor
Signature
namespace duckdb_zstd {
// --- Simple Compression API ---
// Compress src into dst at the given compression level.
// Returns compressed size, or an error code testable with ZSTD_isError().
ZSTDLIB_API size_t ZSTD_compress(void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
// Worst-case compressed size for one-pass compression.
size_t ZSTD_compressBound(size_t srcSize);
// --- Explicit Context API ---
typedef struct ZSTD_CCtx_s ZSTD_CCtx;
ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void);
ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx);
// Compress using an explicit context (reusable across calls).
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
// --- Helper Functions ---
ZSTDLIB_API unsigned ZSTD_isError(size_t code);
ZSTDLIB_API const char* ZSTD_getErrorName(size_t code);
ZSTDLIB_API int ZSTD_minCLevel(void);
ZSTDLIB_API int ZSTD_maxCLevel(void); // currently 22
ZSTDLIB_API int ZSTD_defaultCLevel(void); // currently 3
} // namespace duckdb_zstd
Import
#include "zstd.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| src | const void* |
Yes | Pointer to source data buffer to compress |
| srcSize | size_t |
Yes | Number of bytes to compress |
| dst | void* |
Yes | Pointer to pre-allocated destination buffer |
| dstCapacity | size_t |
Yes | Size of destination buffer; should be >= ZSTD_compressBound(srcSize)
|
| compressionLevel | int |
Yes | Compression level; negative values for fast mode, 1-22 for standard/ultra; default is 3 |
| cctx | ZSTD_CCtx* |
No | Reusable compression context for ZSTD_compressCCtx()
|
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | size_t |
Compressed size written into dst, or an error code testable via ZSTD_isError()
|
| dst buffer | void* |
Contains the compressed zstd frame |
Usage Examples
#include "zstd.h"
using namespace duckdb_zstd;
// --- One-shot Compression ---
const char* data = "DuckDB Zstd compression example data";
size_t dataSize = strlen(data);
size_t bound = ZSTD_compressBound(dataSize);
void* compressed = malloc(bound);
size_t compressedSize = ZSTD_compress(compressed, bound,
data, dataSize,
3 /* default level */);
if (ZSTD_isError(compressedSize)) {
fprintf(stderr, "Error: %s\n", ZSTD_getErrorName(compressedSize));
}
// --- Context-based Compression (reusable) ---
ZSTD_CCtx* cctx = ZSTD_createCCtx();
size_t result = ZSTD_compressCCtx(cctx, compressed, bound,
data, dataSize, 3);
ZSTD_freeCCtx(cctx);
free(compressed);