Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb Brotli Decoder

From Leeroopedia


Knowledge Sources
Domains Compression, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

The Brotli Decoder module implements the complete Brotli decompression algorithm as specified in RFC 7932, providing both one-shot and streaming decompression interfaces used by DuckDB to read Brotli-compressed Parquet data.

Description

This module consists of six files that together form the full Brotli decoding pipeline:

bit_reader.h (419 lines) provides the bit-level reading infrastructure. It defines:

  • BrotliBitReader struct -- holds the pre-fetched bit accumulator (val_), current bit position (bit_pos_), input pointer (next_in), guard position (guard_in), and end pointer (last_in).
  • BrotliBitReaderState -- a checkpoint struct for save/restore operations during speculative parsing.
  • Inline functions for initialization (BrotliInitBitReader), warmup (BrotliWarmupBitReader), state save/restore (BrotliBitReaderSaveState, BrotliBitReaderRestoreState), and input management (BrotliBitReaderSetInput, BrotliBitReaderGetAvailIn).
  • Fast and safe bit reading paths, with a 28-byte "fast input slack" zone for optimized bulk reads.

decode.cpp (2758 lines) is the core decompression engine. It implements:

  • BrotliDecoderCreateInstance / BrotliDecoderDestroyInstance -- lifecycle management with custom allocator support.
  • BrotliDecoderSetParameter -- configures ring buffer allocation strategy and large window mode.
  • BrotliDecoderDecompress -- one-shot memory-to-memory decompression.
  • BrotliDecoderDecompressStream -- streaming decompression with incremental input/output, the primary API used by DuckDB. Returns result codes indicating success, need for more input, need for more output, or error.
  • The internal state machine processes metablocks through states: UNINITED, INITIALIZE, METABLOCK_BEGIN, METABLOCK_HEADER, HUFFMAN_CODE decoding, CONTEXT_MAP decoding, TREE_GROUP building, and the hot command loop (CMD_BEGIN, CMD_INNER, CMD_POST_DECODE_LITERALS, CMD_POST_WRAP_COPY).
  • BrotliDecoderHasMoreOutput, BrotliDecoderTakeOutput -- zero-copy output access.
  • BrotliDecoderGetErrorCode, BrotliDecoderErrorString -- error reporting.

huffman.cpp (338 lines) builds Huffman decoding tables:

  • BrotliBuildCodeLengthsHuffmanTable -- builds the table for code length symbols.
  • BrotliBuildHuffmanTable -- builds general Huffman decoding tables from code lengths.
  • BrotliBuildSimpleHuffmanTable -- builds tables for simple prefix codes (1-4 symbols).
  • Uses the HuffmanCode struct (bits + value) with platform-optimized fast-load macros for ARM targets.

prefix.h (733 lines) contains static lookup tables:

  • CmdLutElement struct -- maps command symbols to insert length extra bits, copy length extra bits, distance code, context, and offset values.
  • kCmdLut[BROTLI_NUM_COMMAND_SYMBOLS] -- the full command lookup table used to decode insert-and-copy length pairs from command symbols.

state.h (386 lines) defines the decoder state machine:

  • BrotliRunningState enum -- 26 states from BROTLI_STATE_UNINITED to BROTLI_STATE_DONE.
  • Sub-state enums for metablock headers, Huffman decoding, context map decoding, and block length reading.
  • BrotliDecoderStateStruct -- the main decoder state containing bit reader, ring buffer, Huffman tree groups (literal, insert-copy, distance), block type/length tracking, distance cache, context maps, and memory management.
  • BrotliDecoderCompoundDictionary -- addon for compound dictionary support.
  • BrotliMetablockHeaderArena and BrotliMetablockBodyArena -- working memory for metablock processing.

decode.h (405 lines) is the public API header:

  • BrotliDecoderResult enum -- ERROR, SUCCESS, NEEDS_MORE_INPUT, NEEDS_MORE_OUTPUT.
  • BrotliDecoderErrorCode -- detailed error codes for format errors (-1 to -16), allocation errors (-21 to -30), and impossible states (-31).
  • BrotliDecoderParameter -- DISABLE_RING_BUFFER_REALLOCATION and LARGE_WINDOW options.
  • All public function declarations.

Usage

DuckDB uses the Brotli decoder to decompress Brotli-compressed column chunks when reading Parquet files. The streaming API (BrotliDecoderDecompressStream) is typically used, allowing DuckDB to feed compressed data incrementally and receive decompressed output in chunks. The decoder operates within the duckdb_brotli namespace to avoid symbol collisions with any system Brotli installation. Custom memory allocators can be provided through the create-instance API if needed.

Code Reference

Source Location

  • Repository: Duckdb_Duckdb
  • File: third_party/brotli/dec/bit_reader.h (419 lines) -- bit-level reading
  • File: third_party/brotli/dec/decode.cpp (2758 lines) -- core decompression
  • File: third_party/brotli/dec/huffman.cpp (338 lines) -- Huffman decoding
  • File: third_party/brotli/dec/prefix.h (733 lines) -- prefix code tables
  • File: third_party/brotli/dec/state.h (386 lines) -- decoder state management
  • File: third_party/brotli/include/brotli/decode.h (405 lines) -- public decode API

Signature

namespace duckdb_brotli {

// Result codes
typedef enum {
    BROTLI_DECODER_RESULT_ERROR = 0,
    BROTLI_DECODER_RESULT_SUCCESS = 1,
    BROTLI_DECODER_RESULT_NEEDS_MORE_INPUT = 2,
    BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT = 3
} BrotliDecoderResult;

// Opaque decoder state
typedef struct BrotliDecoderStateStruct BrotliDecoderState;

// Lifecycle
BROTLI_DEC_API BrotliDecoderState* BrotliDecoderCreateInstance(
    brotli_alloc_func alloc_func,
    brotli_free_func free_func,
    void* opaque);
BROTLI_DEC_API void BrotliDecoderDestroyInstance(
    BrotliDecoderState* state);

// Configuration
BROTLI_DEC_API BROTLI_BOOL BrotliDecoderSetParameter(
    BrotliDecoderState* state,
    BrotliDecoderParameter param,
    uint32_t value);

// One-shot decompression
BROTLI_DEC_API BrotliDecoderResult BrotliDecoderDecompress(
    size_t encoded_size,
    const uint8_t encoded_buffer[],
    size_t* decoded_size,
    uint8_t decoded_buffer[]);

// Streaming decompression
BROTLI_DEC_API BrotliDecoderResult BrotliDecoderDecompressStream(
    BrotliDecoderState* state,
    size_t* available_in, const uint8_t** next_in,
    size_t* available_out, uint8_t** next_out,
    size_t* total_out);

// Output access
BROTLI_DEC_API BROTLI_BOOL BrotliDecoderHasMoreOutput(
    const BrotliDecoderState* state);
BROTLI_DEC_API const uint8_t* BrotliDecoderTakeOutput(
    BrotliDecoderState* state, size_t* size);

// Dictionary attachment
BROTLI_DEC_API BROTLI_BOOL BrotliDecoderAttachDictionary(
    BrotliDecoderState* state,
    BrotliSharedDictionaryType type,
    size_t data_size,
    const uint8_t data[]);

// Error reporting
BROTLI_DEC_API BrotliDecoderErrorCode BrotliDecoderGetErrorCode(
    const BrotliDecoderState* state);
BROTLI_DEC_API const char* BrotliDecoderErrorString(
    BrotliDecoderErrorCode c);

// Bit reader (from bit_reader.h)
typedef struct {
    brotli_reg_t val_;
    brotli_reg_t bit_pos_;
    const uint8_t* next_in;
    const uint8_t* guard_in;
    const uint8_t* last_in;
} BrotliBitReader;

BROTLI_INTERNAL void BrotliInitBitReader(BrotliBitReader* br);
BROTLI_INTERNAL BROTLI_BOOL BrotliWarmupBitReader(BrotliBitReader* br);

// Huffman code (from huffman.h)
typedef struct {
    uint8_t bits;
    uint16_t value;
} HuffmanCode;

// Command lookup (from prefix.h)
typedef struct CmdLutElement {
    uint8_t insert_len_extra_bits;
    uint8_t copy_len_extra_bits;
    int8_t distance_code;
    uint8_t context;
    uint16_t insert_len_offset;
    uint16_t copy_len_offset;
} CmdLutElement;

} // namespace duckdb_brotli

Import

#include <brotli/decode.h>

I/O Contract

Inputs

Name Type Required Description
alloc_func brotli_alloc_func No Custom memory allocation function (NULL for default malloc)
free_func brotli_free_func No Custom memory free function (NULL for default free)
opaque void* No Opaque pointer passed to custom allocator functions
encoded_buffer const uint8_t* Yes Compressed input data buffer
encoded_size size_t Yes Size of compressed input data in bytes
available_in size_t* Yes Pointer to available input byte count (streaming API, in/out)
next_in const uint8_t** Yes Pointer to next input byte pointer (streaming API, in/out)
available_out size_t* Yes Pointer to available output space (streaming API, in/out)
next_out uint8_t** Yes Pointer to output buffer cursor (streaming API, in/out)
param BrotliDecoderParameter Yes Parameter to set (DISABLE_RING_BUFFER_REALLOCATION or LARGE_WINDOW)
value uint32_t Yes Parameter value

Outputs

Name Type Description
(return) BrotliDecoderResult Result code: SUCCESS, NEEDS_MORE_INPUT, NEEDS_MORE_OUTPUT, or ERROR
decoded_buffer uint8_t* Buffer filled with decompressed data (one-shot API)
decoded_size size_t* Number of decompressed bytes written (one-shot API)
total_out size_t* Cumulative bytes decompressed since initialization (streaming API)
next_out uint8_t** Updated output cursor after decompression (streaming API)
available_out size_t* Remaining output buffer space (streaming API)

Usage Examples

// One-shot decompression in DuckDB context
#include <brotli/decode.h>
using namespace duckdb_brotli;

// Decompress a Brotli-compressed Parquet column chunk
size_t decoded_size = expected_uncompressed_size;
uint8_t* decoded_buffer = (uint8_t*)malloc(decoded_size);

BrotliDecoderResult result = BrotliDecoderDecompress(
    compressed_size, compressed_data,
    &decoded_size, decoded_buffer);

if (result != BROTLI_DECODER_RESULT_SUCCESS) {
    // Handle decompression error
}

// Streaming decompression example
BrotliDecoderState* state = BrotliDecoderCreateInstance(NULL, NULL, NULL);

size_t avail_in = compressed_size;
const uint8_t* next_in_ptr = compressed_data;
size_t avail_out = output_buffer_size;
uint8_t* next_out_ptr = output_buffer;
size_t total = 0;

BrotliDecoderResult res;
do {
    res = BrotliDecoderDecompressStream(
        state, &avail_in, &next_in_ptr,
        &avail_out, &next_out_ptr, &total);
    if (res == BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT) {
        // Flush output buffer, reset avail_out and next_out_ptr
    }
} while (res == BROTLI_DECODER_RESULT_NEEDS_MORE_INPUT ||
         res == BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT);

BrotliDecoderDestroyInstance(state);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment