Principle:Duckdb Duckdb Brotli Compression

Knowledge Sources	Brotli Compressed Data Format Brotli
Domains	Compression, Data_Format
Last Updated	2026-02-07 12:00 GMT

Overview

A general-purpose lossless compression algorithm that combines LZ77, Huffman coding, and second-order context modeling to achieve high compression ratios.

Description

Brotli is a compression algorithm developed by Google that achieves higher compression ratios than gzip while maintaining comparable decompression speeds. It uses a combination of a modern variant of the LZ77 algorithm, Huffman coding, and second-order context modeling. Brotli includes a pre-defined dictionary of common words and phrases from web content, which contributes to its superior compression of HTML, CSS, JavaScript, and other web formats.

The algorithm operates in three main stages:

LZ77 backward reference search — finds repeated byte sequences and encodes them as distance/length pairs
Context modeling — uses the context of previously seen bytes to improve entropy coding
Entropy coding — uses Huffman codes (and optionally static prefix codes) to encode the compressed data

Usage

Use Brotli compression when handling HTTP content encoding in DuckDB's httpfs extension, or when reading data from sources that use Brotli compression. It provides better compression ratios than gzip/zlib for web-oriented content.

Theoretical Basis

The core of Brotli combines three compression techniques:

LZ77 Sliding Window: Finds matches in a sliding window of up to 16 MiB:

// Pseudo-code
for each position in input:
    find longest match in window (up to 16 MiB back)
    if match found: emit (distance, length) pair
    else: emit literal byte

Huffman Coding: Encodes symbols using variable-length prefix codes based on frequency:

// More frequent symbols get shorter codes
frequency_table = count_symbols(data)
huffman_tree = build_tree(frequency_table)
encoded = encode_with_tree(data, huffman_tree)

Context Modeling: Uses previous bytes to select among multiple Huffman trees:

context = compute_context(prev_bytes)
tree = select_huffman_tree(context)
symbol = decode_with_tree(tree, bitstream)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment