Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb Zstd Compressor

From Leeroopedia


Knowledge Sources
Domains Compression, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

Zstd Compressor is the compression side of the Zstandard (zstd) library (version 1.5.6), integrated into DuckDB under the duckdb_zstd namespace, providing configurable compression levels from fast (negative levels) to ultra (level 22).

Description

The Zstd compressor in DuckDB is a comprehensive implementation consisting of multiple specialized modules that together form a high-performance compression pipeline:

  • zstd_compress.cpp (7147 lines) -- main compressor entry point implementing ZSTD_compress(), ZSTD_compressCCtx(), context management, and the one-pass compression API
  • zstd_fast.cpp (972 lines) -- fast match finder for lower compression levels
  • zstd_double_fast.cpp (774 lines) -- double-hash match finder for medium compression levels
  • zstd_lazy.cpp (2203 lines) -- lazy/greedy/btlazy2 match finders for higher compression levels
  • zstd_opt.cpp (1580 lines) -- optimal parser for maximum compression ratio (levels 16+)
  • zstd_ldm.cpp (735 lines) -- long distance matching for improved compression on large windows
  • fse_compress.cpp (628 lines) -- Finite State Entropy encoder for sequence encoding
  • huf_compress.cpp (1467 lines) -- Huffman encoder for literal compression
  • zstd_compress_sequences.cpp (446 lines) -- sequence encoding for match/literal lengths and offsets
  • zstd_compress_superblock.cpp (692 lines) -- superblock compression for streaming mode
  • zstdmt_compress.cpp (1885 lines) -- multi-threaded compression with thread pool and job queue

The compressor uses stack-based state allocation by default (ZSTD_COMPRESS_HEAPMODE = 0) and supports a 3-byte hash table with configurable maximum log size (ZSTD_HASHLOG3_MAX = 17, i.e., 128Ki positions).

Usage

DuckDB uses the Zstd compressor for storage-layer compression when higher compression ratios are needed compared to LZ4, and for Parquet file output with Zstd codec. The compression context (ZSTD_CCtx) can be reused across multiple compression operations to reduce allocation overhead.

Code Reference

Source Location

Signature

namespace duckdb_zstd {

// --- Simple Compression API ---

// Compress src into dst at the given compression level.
// Returns compressed size, or an error code testable with ZSTD_isError().
ZSTDLIB_API size_t ZSTD_compress(void* dst, size_t dstCapacity,
                                 const void* src, size_t srcSize,
                                 int compressionLevel);

// Worst-case compressed size for one-pass compression.
size_t ZSTD_compressBound(size_t srcSize);

// --- Explicit Context API ---

typedef struct ZSTD_CCtx_s ZSTD_CCtx;
ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void);
ZSTDLIB_API size_t      ZSTD_freeCCtx(ZSTD_CCtx* cctx);

// Compress using an explicit context (reusable across calls).
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
                                     void* dst, size_t dstCapacity,
                                     const void* src, size_t srcSize,
                                     int compressionLevel);

// --- Helper Functions ---
ZSTDLIB_API unsigned    ZSTD_isError(size_t code);
ZSTDLIB_API const char* ZSTD_getErrorName(size_t code);
ZSTDLIB_API int         ZSTD_minCLevel(void);
ZSTDLIB_API int         ZSTD_maxCLevel(void);       // currently 22
ZSTDLIB_API int         ZSTD_defaultCLevel(void);    // currently 3

} // namespace duckdb_zstd

Import

#include "zstd.h"

I/O Contract

Inputs

Name Type Required Description
src const void* Yes Pointer to source data buffer to compress
srcSize size_t Yes Number of bytes to compress
dst void* Yes Pointer to pre-allocated destination buffer
dstCapacity size_t Yes Size of destination buffer; should be >= ZSTD_compressBound(srcSize)
compressionLevel int Yes Compression level; negative values for fast mode, 1-22 for standard/ultra; default is 3
cctx ZSTD_CCtx* No Reusable compression context for ZSTD_compressCCtx()

Outputs

Name Type Description
(return) size_t Compressed size written into dst, or an error code testable via ZSTD_isError()
dst buffer void* Contains the compressed zstd frame

Usage Examples

#include "zstd.h"

using namespace duckdb_zstd;

// --- One-shot Compression ---
const char* data = "DuckDB Zstd compression example data";
size_t dataSize = strlen(data);
size_t bound = ZSTD_compressBound(dataSize);
void* compressed = malloc(bound);

size_t compressedSize = ZSTD_compress(compressed, bound,
                                      data, dataSize,
                                      3 /* default level */);
if (ZSTD_isError(compressedSize)) {
    fprintf(stderr, "Error: %s\n", ZSTD_getErrorName(compressedSize));
}

// --- Context-based Compression (reusable) ---
ZSTD_CCtx* cctx = ZSTD_createCCtx();
size_t result = ZSTD_compressCCtx(cctx, compressed, bound,
                                  data, dataSize, 3);
ZSTD_freeCCtx(cctx);
free(compressed);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment