Implementation:Duckdb Duckdb Snappy Compression
| Knowledge Sources | |
|---|---|
| Domains | Compression, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Snappy is a fast compression/decompression library developed by Google, integrated into DuckDB for speed-oriented data compression with reasonable compression ratios.
Description
The DuckDB-vendored Snappy library lives in the duckdb_snappy namespace and provides multiple API tiers: high-level std::string-based routines, lower-level raw buffer routines, and a generic Source/Sink I/O abstraction for streaming use. The library supports two compression levels: level 1 (default, fastest) and level 2 (experimental, slightly slower but better ratio -- comparable to LZ4 level 2 or zstd levels -3 to -2).
Key architectural components:
- snappy.h -- public API with string-based and raw buffer compression/decompression
- snappy-sinksource.h -- abstract
SinkandSourceclasses for I/O abstraction - snappy-internal.h -- internal helpers including SIMD vector operations (SSE3/NEON) for byte shuffling
- snappy-stubs-internal.h -- platform compatibility stubs (builtin expect, CTZ, prefetch, endian handling)
The library uses a 64 KB block size with 16-bit offsets and supports hardware acceleration through BMI2, SSE4.2 CRC32, and NEON CRC32 instructions when available.
Usage
DuckDB uses Snappy for Parquet file compression/decompression, as Snappy is one of the standard compression codecs in the Parquet format. The duckdb_snappy namespace ensures isolation from any system-installed Snappy library.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
- third_party/snappy/snappy.h (412 lines) -- Snappy public API
- third_party/snappy/snappy.cc (4288 lines) -- Snappy compression/decompression implementation
- third_party/snappy/snappy-internal.h (647 lines) -- internal helpers and SIMD operations
- third_party/snappy/snappy-sinksource.h (340 lines) -- I/O abstraction (Sink/Source classes)
- third_party/snappy/snappy-stubs-internal.h (1026 lines) -- platform compatibility stubs
Signature
namespace duckdb_snappy {
// --- Compression Options ---
struct CompressionOptions {
int level = DefaultCompressionLevel();
static constexpr int MinCompressionLevel() { return 1; }
static constexpr int MaxCompressionLevel() { return 2; }
static constexpr int DefaultCompressionLevel() { return 1; }
};
// --- Generic Source/Sink API ---
size_t Compress(Source* reader, Sink* writer);
size_t Compress(Source* reader, Sink* writer, CompressionOptions options);
bool GetUncompressedLength(Source* source, uint32_t* result);
// --- High-level String API ---
size_t Compress(const char* input, size_t input_length,
std::string* compressed);
size_t Compress(const char* input, size_t input_length,
std::string* compressed, CompressionOptions options);
bool Uncompress(const char* compressed, size_t compressed_length,
std::string* uncompressed);
// --- Low-level Raw Buffer API ---
void RawCompress(const char* input, size_t input_length,
char* compressed, size_t* compressed_length);
bool RawUncompress(const char* compressed, size_t compressed_length,
char* uncompressed);
bool RawUncompress(Source* compressed, char* uncompressed);
// --- Utility Functions ---
size_t MaxCompressedLength(size_t source_bytes);
bool GetUncompressedLength(const char* compressed,
size_t compressed_length, size_t* result);
bool IsValidCompressedBuffer(const char* compressed,
size_t compressed_length);
bool IsValidCompressed(Source* compressed);
// --- I/O Abstraction Classes ---
class Sink {
public:
virtual void Append(const char* bytes, size_t n) = 0;
virtual char* GetAppendBuffer(size_t length, char* scratch);
virtual void AppendAndTakeOwnership(
char* bytes, size_t n,
void (*deleter)(void*, const char*, size_t), void* deleter_arg);
};
class Source {
public:
virtual size_t Available() const = 0;
virtual const char* Peek(size_t* len) = 0;
virtual void Skip(size_t n) = 0;
};
} // namespace duckdb_snappy
Import
#include "snappy.h"
#include "snappy-sinksource.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | const char* |
Yes | Pointer to raw uncompressed data |
| input_length | size_t |
Yes | Number of bytes in the uncompressed input |
| compressed | const char* |
Yes (for decompression) | Pointer to Snappy-compressed data |
| compressed_length | size_t |
Yes (for decompression) | Number of compressed bytes |
| options | CompressionOptions |
No | Compression level (1 = fast default, 2 = experimental higher ratio) |
| reader / source | Source* |
No | Abstract byte source for streaming API |
| writer / uncompressed | Sink* |
No | Abstract byte sink for streaming API |
Outputs
| Name | Type | Description |
|---|---|---|
| (return) from Compress | size_t |
Number of compressed bytes written |
| (return) from Uncompress | bool |
true if decompression succeeded; false if the data was corrupted
|
| compressed_length (out param) | size_t* |
Length of compressed output for RawCompress()
|
| result (out param) | size_t* |
Decompressed length retrieved by GetUncompressedLength()
|
Usage Examples
#include "snappy.h"
// --- String-based Compression ---
const char* input = "Repeating data repeating data repeating data";
size_t input_len = strlen(input);
std::string compressed;
duckdb_snappy::Compress(input, input_len, &compressed);
// --- String-based Decompression ---
std::string uncompressed;
bool ok = duckdb_snappy::Uncompress(
compressed.data(), compressed.size(), &uncompressed);
// --- Raw Buffer Compression ---
size_t max_len = duckdb_snappy::MaxCompressedLength(input_len);
char* out_buf = new char[max_len];
size_t out_len;
duckdb_snappy::RawCompress(input, input_len, out_buf, &out_len);
// --- Validation ---
bool valid = duckdb_snappy::IsValidCompressedBuffer(out_buf, out_len);
delete[] out_buf;