Principle:Duckdb Duckdb Parquet Serialization
| Knowledge Sources | |
|---|---|
| Domains | Columnar_Storage, Serialization, Data_Format |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A columnar file format type system and metadata serialization scheme based on Apache Thrift, defining how structured data schemas, statistics, and encoding metadata are stored within Parquet files.
Description
Apache Parquet is a columnar storage file format designed for efficient analytical processing of large datasets. The Parquet format uses Apache Thrift as its serialization framework for encoding file metadata, schema information, column statistics, and other structural elements. Understanding the Parquet type system and serialization is essential for reading and writing Parquet files correctly.
The Parquet type system defines a set of primitive physical types (BOOLEAN, INT32, INT64, INT96, FLOAT, DOUBLE, BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY) and logical types that layer additional semantics on top (STRING, DATE, TIMESTAMP, DECIMAL, UUID, etc.). The schema is represented as a tree of schema elements, where leaf nodes represent physical columns and intermediate nodes represent nested structures (groups). This schema representation is directly derived from Google's Dremel paper and supports complex nested data through repetition and definition levels.
Thrift serialization provides a compact binary encoding for the metadata structures. Each Parquet file contains a file-level metadata footer that describes the schema, row groups, column chunks, encoding types, compression codecs, and column statistics (min/max values, null counts). This metadata is serialized using Thrift's compact binary protocol, which uses variable-length integer encoding and field delta encoding for space efficiency.
Usage
Parquet serialization is used in DuckDB whenever reading from or writing to Parquet files. When reading, DuckDB deserializes the Thrift-encoded metadata footer to understand the file schema, locate column chunks, and use column statistics for predicate pushdown. When writing, DuckDB serializes schema information and column statistics into the Thrift format for the file footer. The type mapping between DuckDB's internal types and Parquet's type system is a critical component of this process.
Theoretical Basis
Parquet File Structure:
Parquet File Layout:
4 bytes: magic number "PAR1"
Row Group 1:
Column Chunk 1: [page header + data] [page header + data] ...
Column Chunk 2: [page header + data] [page header + data] ...
...
Row Group 2:
...
File Metadata (Thrift-encoded):
version, schema, row_groups[], key_value_metadata[]
4 bytes: length of file metadata
4 bytes: magic number "PAR1"
Thrift Compact Protocol Encoding:
// Field encoding: [field_delta + type (1 byte)] [value]
// Types: boolean, i8, i16, i32, i64, double, binary, struct, list, map
// Variable-length integer (zigzag + varint):
function encode_varint(n):
zigzag = (n << 1) XOR (n >> 63) // zigzag encoding
while zigzag > 0x7F:
emit(zigzag & 0x7F | 0x80) // 7 bits + continuation
zigzag >>= 7
emit(zigzag & 0x7F)
// Field delta encoding:
// Instead of absolute field IDs, encode delta from previous
// field_header = (delta << 4) | type (if delta fits in 4 bits)
Schema Representation (Dremel Model):
// Schema tree: groups contain fields, fields have repetition
// Repetition: REQUIRED, OPTIONAL, REPEATED
// Definition level: how many optional fields are defined
// Repetition level: which repeated field has a new value
// Example schema:
message Document {
required int64 doc_id;
optional group name {
repeated group language {
required string code;
optional string country;
}
}
}
// Flattened columns with levels:
// doc_id: [def_level=0, rep_level=0]
// name.language.code: [def_level=2, rep_level=0..1]
// name.language.country: [def_level=2..3, rep_level=0..1]
Column Statistics:
// Per-column-chunk and per-page statistics
Statistics {
min_value: bytes // minimum value in column/page
max_value: bytes // maximum value in column/page
null_count: int64 // number of nulls
distinct_count: int64 // approximate distinct values
is_min_value_exact: bool
is_max_value_exact: bool
}
// Used for predicate pushdown: skip row groups/pages
// where filter condition cannot match the value range