Implementation:Duckdb Duckdb Miniz Compression
| Knowledge Sources | |
|---|---|
| Domains | Compression, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Miniz is a single-source-file, public-domain zlib-compatible compression library (version 10.0.3) integrated into DuckDB for deflate/inflate, CRC-32, Adler-32, and ZIP archive support.
Description
The DuckDB-vendored Miniz library provides a drop-in replacement for the zlib compression API subset, operating within the duckdb_miniz namespace. It implements RFC 1950 (zlib format) and RFC 1951 (deflate format) and exposes three API tiers:
Low-level APIs (tdefl/tinfl):
tdefl-- deflate compressor supporting raw, static, and dynamic blocks with lazy/greedy parsing, RLE-only, and Huffman-only modestinfl-- single-function coroutine decompressor supporting 32 KB wrapping buffers or full-file decompression- No dynamic memory allocation required
zlib-style API (mz_* functions):
mz_deflateInit/mz_deflateInit2/mz_deflate/mz_deflateEnd-- streaming compressionmz_inflateInit/mz_inflateInit2/mz_inflate/mz_inflateEnd-- streaming decompressionmz_compress/mz_compress2/mz_uncompress-- single-call convenience functionsmz_crc32/mz_adler32-- checksum functions- Compression levels 0-9 (standard zlib) plus level 10 ("uber" compression)
DuckDB MiniZStream Wrapper:
DuckDB provides a MiniZStream wrapper class (in the duckdb namespace) that adds GZIP header/footer handling on top of raw deflate. This wrapper validates GZIP magic bytes (0x1F 0x8B), manages mz_inflateInit2/mz_deflateInit2 with raw deflate mode, and computes CRC-32 footers automatically.
The DuckDB build disables stdio (MINIZ_NO_STDIO), time functions (MINIZ_NO_TIME), and zlib-compatible names (MINIZ_NO_ZLIB_COMPATIBLE_NAMES) to avoid conflicts.
Usage
DuckDB uses Miniz primarily for GZIP compression/decompression in HTTP transport (e.g., downloading CSV files or Parquet files over HTTP with gzip content-encoding) and for ZIP archive reading (e.g., reading zipped CSV or Parquet files). The MiniZStream wrapper is the primary DuckDB-level interface.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
- third_party/miniz/miniz.hpp (1288 lines) -- Miniz API header
- third_party/miniz/miniz.cpp (7544 lines) -- Miniz zlib-compatible implementation
- third_party/miniz/miniz_wrapper.hpp (167 lines) -- DuckDB MiniZStream GZIP wrapper
Signature
namespace duckdb_miniz {
// --- Single-call Compression/Decompression ---
int mz_compress(unsigned char *pDest, mz_ulong *pDest_len,
const unsigned char *pSource, mz_ulong source_len);
int mz_compress2(unsigned char *pDest, mz_ulong *pDest_len,
const unsigned char *pSource, mz_ulong source_len,
int level);
int mz_uncompress(unsigned char *pDest, mz_ulong *pDest_len,
const unsigned char *pSource, mz_ulong source_len);
mz_ulong mz_compressBound(mz_ulong source_len);
// --- Streaming Compression ---
int mz_deflateInit(mz_streamp pStream, int level);
int mz_deflateInit2(mz_streamp pStream, int level, int method,
int window_bits, int mem_level, int strategy);
int mz_deflate(mz_streamp pStream, int flush);
int mz_deflateEnd(mz_streamp pStream);
mz_ulong mz_deflateBound(mz_streamp pStream, mz_ulong source_len);
// --- Streaming Decompression ---
int mz_inflateInit(mz_streamp pStream);
int mz_inflateInit2(mz_streamp pStream, int window_bits);
int mz_inflate(mz_streamp pStream, int flush);
int mz_inflateEnd(mz_streamp pStream);
// --- Checksums ---
mz_ulong mz_crc32(mz_ulong crc, const unsigned char *ptr, size_t buf_len);
mz_ulong mz_adler32(mz_ulong adler, const unsigned char *ptr, size_t buf_len);
// --- Error ---
const char *mz_error(int err);
} // namespace duckdb_miniz
namespace duckdb {
// --- DuckDB GZIP Wrapper ---
enum class MiniZStreamType { MINIZ_TYPE_NONE, MINIZ_TYPE_INFLATE, MINIZ_TYPE_DEFLATE };
struct MiniZStream {
void Decompress(const char *compressed_data, size_t compressed_size,
char *out_data, size_t out_size);
void Compress(const char *uncompressed_data, size_t uncompressed_size,
char *out_data, size_t *out_size);
static size_t MaxCompressedLength(size_t input_size);
static void InitializeGZIPHeader(unsigned char *gzip_header);
static void InitializeGZIPFooter(unsigned char *gzip_footer,
duckdb_miniz::mz_ulong crc,
idx_t uncompressed_size);
};
} // namespace duckdb
Import
#include "miniz.hpp" // low-level miniz API
#include "miniz_wrapper.hpp" // DuckDB GZIP wrapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pSource / compressed_data | const unsigned char* / const char* |
Yes | Pointer to input data (raw for compression, compressed for decompression) |
| source_len / compressed_size | mz_ulong / size_t |
Yes | Number of input bytes |
| pDest / out_data | unsigned char* / char* |
Yes | Pointer to pre-allocated output buffer |
| pDest_len / out_size | mz_ulong* / size_t* |
Yes | On input: buffer capacity; on output: bytes written (for mz_compress/mz_uncompress)
|
| level | int |
No | Compression level: 0 (none) to 9 (best), 10 (uber); default is 6 |
| window_bits | int |
No | MZ_DEFAULT_WINDOW_BITS (15) for zlib-wrapped, negated for raw deflate
|
| flush | int |
Yes (for streaming) | MZ_NO_FLUSH, MZ_SYNC_FLUSH, MZ_FINISH, etc.
|
Outputs
| Name | Type | Description |
|---|---|---|
| (return) from mz_compress/mz_uncompress | int |
MZ_OK (0) on success; negative error codes (MZ_STREAM_ERROR, MZ_DATA_ERROR, MZ_BUF_ERROR, etc.) on failure
|
| (return) from mz_deflate/mz_inflate | int |
MZ_OK or MZ_STREAM_END on success; negative on error
|
| pDest_len (out param) | mz_ulong* |
Actual number of bytes written to the output buffer |
| out_size (out param for MiniZStream) | size_t* |
Total GZIP output size including header and footer |
Usage Examples
#include "miniz_wrapper.hpp"
// --- GZIP Compression via DuckDB Wrapper ---
const char* data = "Hello DuckDB Miniz GZIP compression!";
size_t data_len = strlen(data);
size_t max_out = duckdb::MiniZStream::MaxCompressedLength(data_len);
char* compressed = new char[max_out];
size_t out_size = max_out;
duckdb::MiniZStream stream;
stream.Compress(data, data_len, compressed, &out_size);
// out_size now contains the GZIP-compressed size
// --- GZIP Decompression via DuckDB Wrapper ---
char* decompressed = new char[data_len];
duckdb::MiniZStream decomp_stream;
decomp_stream.Decompress(compressed, out_size, decompressed, data_len);
delete[] compressed;
delete[] decompressed;
// --- Low-level Single-call API ---
duckdb_miniz::mz_ulong comp_len = duckdb_miniz::mz_compressBound(data_len);
unsigned char* comp_buf = new unsigned char[comp_len];
int ret = duckdb_miniz::mz_compress(comp_buf, &comp_len,
reinterpret_cast<const unsigned char*>(data), data_len);
// ret == duckdb_miniz::MZ_OK on success
delete[] comp_buf;