Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb Snappy Compression

From Leeroopedia


Knowledge Sources
Domains Compression, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

Snappy is a fast compression/decompression library developed by Google, integrated into DuckDB for speed-oriented data compression with reasonable compression ratios.

Description

The DuckDB-vendored Snappy library lives in the duckdb_snappy namespace and provides multiple API tiers: high-level std::string-based routines, lower-level raw buffer routines, and a generic Source/Sink I/O abstraction for streaming use. The library supports two compression levels: level 1 (default, fastest) and level 2 (experimental, slightly slower but better ratio -- comparable to LZ4 level 2 or zstd levels -3 to -2).

Key architectural components:

  • snappy.h -- public API with string-based and raw buffer compression/decompression
  • snappy-sinksource.h -- abstract Sink and Source classes for I/O abstraction
  • snappy-internal.h -- internal helpers including SIMD vector operations (SSE3/NEON) for byte shuffling
  • snappy-stubs-internal.h -- platform compatibility stubs (builtin expect, CTZ, prefetch, endian handling)

The library uses a 64 KB block size with 16-bit offsets and supports hardware acceleration through BMI2, SSE4.2 CRC32, and NEON CRC32 instructions when available.

Usage

DuckDB uses Snappy for Parquet file compression/decompression, as Snappy is one of the standard compression codecs in the Parquet format. The duckdb_snappy namespace ensures isolation from any system-installed Snappy library.

Code Reference

Source Location

Signature

namespace duckdb_snappy {

// --- Compression Options ---
struct CompressionOptions {
    int level = DefaultCompressionLevel();
    static constexpr int MinCompressionLevel() { return 1; }
    static constexpr int MaxCompressionLevel() { return 2; }
    static constexpr int DefaultCompressionLevel() { return 1; }
};

// --- Generic Source/Sink API ---
size_t Compress(Source* reader, Sink* writer);
size_t Compress(Source* reader, Sink* writer, CompressionOptions options);
bool GetUncompressedLength(Source* source, uint32_t* result);

// --- High-level String API ---
size_t Compress(const char* input, size_t input_length,
                std::string* compressed);
size_t Compress(const char* input, size_t input_length,
                std::string* compressed, CompressionOptions options);
bool Uncompress(const char* compressed, size_t compressed_length,
                std::string* uncompressed);

// --- Low-level Raw Buffer API ---
void RawCompress(const char* input, size_t input_length,
                 char* compressed, size_t* compressed_length);
bool RawUncompress(const char* compressed, size_t compressed_length,
                   char* uncompressed);
bool RawUncompress(Source* compressed, char* uncompressed);

// --- Utility Functions ---
size_t MaxCompressedLength(size_t source_bytes);
bool GetUncompressedLength(const char* compressed,
                           size_t compressed_length, size_t* result);
bool IsValidCompressedBuffer(const char* compressed,
                             size_t compressed_length);
bool IsValidCompressed(Source* compressed);

// --- I/O Abstraction Classes ---
class Sink {
public:
    virtual void Append(const char* bytes, size_t n) = 0;
    virtual char* GetAppendBuffer(size_t length, char* scratch);
    virtual void AppendAndTakeOwnership(
        char* bytes, size_t n,
        void (*deleter)(void*, const char*, size_t), void* deleter_arg);
};

class Source {
public:
    virtual size_t Available() const = 0;
    virtual const char* Peek(size_t* len) = 0;
    virtual void Skip(size_t n) = 0;
};

} // namespace duckdb_snappy

Import

#include "snappy.h"
#include "snappy-sinksource.h"

I/O Contract

Inputs

Name Type Required Description
input const char* Yes Pointer to raw uncompressed data
input_length size_t Yes Number of bytes in the uncompressed input
compressed const char* Yes (for decompression) Pointer to Snappy-compressed data
compressed_length size_t Yes (for decompression) Number of compressed bytes
options CompressionOptions No Compression level (1 = fast default, 2 = experimental higher ratio)
reader / source Source* No Abstract byte source for streaming API
writer / uncompressed Sink* No Abstract byte sink for streaming API

Outputs

Name Type Description
(return) from Compress size_t Number of compressed bytes written
(return) from Uncompress bool true if decompression succeeded; false if the data was corrupted
compressed_length (out param) size_t* Length of compressed output for RawCompress()
result (out param) size_t* Decompressed length retrieved by GetUncompressedLength()

Usage Examples

#include "snappy.h"

// --- String-based Compression ---
const char* input = "Repeating data repeating data repeating data";
size_t input_len = strlen(input);
std::string compressed;
duckdb_snappy::Compress(input, input_len, &compressed);

// --- String-based Decompression ---
std::string uncompressed;
bool ok = duckdb_snappy::Uncompress(
    compressed.data(), compressed.size(), &uncompressed);

// --- Raw Buffer Compression ---
size_t max_len = duckdb_snappy::MaxCompressedLength(input_len);
char* out_buf = new char[max_len];
size_t out_len;
duckdb_snappy::RawCompress(input, input_len, out_buf, &out_len);

// --- Validation ---
bool valid = duckdb_snappy::IsValidCompressedBuffer(out_buf, out_len);
delete[] out_buf;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment