Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Duckdb Duckdb Generate Serialization

From Leeroopedia


Overview

Concrete tool for generating DuckDB serialization and deserialization code from JSON schema definitions. The script scripts/generate_serialization.py reads JSON files that describe serializable class hierarchies, validates versioning, and produces C++ source files containing Serialize() and Deserialize() methods for each class.

Code Reference

Field Value
Source scripts/generate_serialization.py (lines 1--858)
Language Python 3
API python3 scripts/generate_serialization.py [--source dir] [--target dir]
External Dependencies python3, json (stdlib), re (stdlib), argparse (stdlib), enum (stdlib)

Key Classes and Enums

class MemberVariableStatus(Enum):
    EXISTING = 1    # Both serialized and deserialized
    READ_ONLY = 2   # Not serialized, but is deserialized (requires default)
    DELETED = 3     # Not serialized, not deserialized

I/O Contract

Default Source/Target Directories

When invoked without arguments, the script processes three directory pairs:

Source (JSON specs) Target (generated .cpp)
src/include/duckdb/storage/serialization/ src/storage/serialization/
extension/parquet/include/ extension/parquet/
extension/json/include/ extension/json/

Custom directories can be specified via --source and --target flags.

Input: Serialization JSON Schema

Each JSON file defines an array of serializable classes:

[
    {
        "class": "Constraint",
        "class_type": "type",
        "includes": ["duckdb/parser/constraints/list.hpp"],
        "members": [
            {
                "id": 100,
                "name": "type",
                "type": "ConstraintType"
            }
        ]
    },
    {
        "class": "NotNullConstraint",
        "base": "Constraint",
        "enum": "NOT_NULL",
        "members": [
            {
                "id": 200,
                "name": "index",
                "type": "LogicalIndex"
            }
        ],
        "constructor": ["index"]
    }
]
JSON Field Type Description
class string C++ class name
base string (optional) Base class name for inheritance hierarchy
class_type string (optional) Name of the discriminator field for polymorphic dispatch
enum string (optional) Enum value for this subclass in the discriminator
members[].id integer Stable numeric serialization ID for the field
members[].name string C++ member variable name
members[].type string C++ type (supports containers, pointers, primitives)
members[].property string (optional) Accessor name if different from name
members[].default value (optional) Default value for backward-compatible deserialization
members[].version string (optional) DuckDB version when the field was introduced
members[].status string (optional) One of "existing", "read_only", "deleted"
constructor array (optional) List of member names to pass to the constructor

Version Map

The script reads src/storage/version_map.json to resolve version strings to serialization version constants. This ensures that version-tagged fields map to the correct SerializationVersion enum values.

Output Files

For each input JSON file foo.json, the script produces serialize_foo.cpp in the target directory. Each generated file contains:

  • A file header warning that it is auto-generated
  • Include directives for all referenced headers
  • Serialize() methods using serializer.WriteProperty calls
  • Deserialize() methods using deserializer.ReadProperty calls
  • Switch-based polymorphic dispatch for base classes
Input JSON Generated Output
src/include/duckdb/storage/serialization/constraint.json src/storage/serialization/serialize_constraint.cpp
src/include/duckdb/storage/serialization/expression.json src/storage/serialization/serialize_expression.cpp
src/include/duckdb/storage/serialization/logical_operator.json src/storage/serialization/serialize_logical_operator.cpp
src/include/duckdb/storage/serialization/parsed_expression.json src/storage/serialization/serialize_parsed_expression.cpp

Generated Code Pattern

// Serialize method
void NotNullConstraint::Serialize(Serializer &serializer) const {
    Constraint::Serialize(serializer);
    serializer.WriteProperty<LogicalIndex>(200, "index", index);
}

// Deserialize method (base class with polymorphic dispatch)
unique_ptr<Constraint> Constraint::Deserialize(Deserializer &deserializer) {
    auto type = deserializer.ReadProperty<ConstraintType>(100, "type");
    unique_ptr<Constraint> result;
    switch (type) {
    case ConstraintType::NOT_NULL:
        result = NotNullConstraint::Deserialize(deserializer);
        break;
    case ConstraintType::CHECK:
        result = CheckConstraint::Deserialize(deserializer);
        break;
    default:
        throw SerializationException("Unsupported type for deserialization of Constraint!");
    }
    return std::move(result);
}

Key Helper Functions

Function Purpose
get_file_list() Discovers JSON specs and maps them to output paths
lookup_serialization_version() Resolves a DuckDB version string to a serialization version constant
is_container(type) Checks if a type is a template container (contains <)
is_pointer(type) Checks if a type is a pointer or shared_ptr
requires_move(type) Determines if a deserialized value needs std::move
replace_pointer(type) Converts raw pointer types to unique_ptr wrappers
parse_status(status) Converts a status string to MemberVariableStatus enum

Usage Examples

Generate all serialization code with default directories:

python3 scripts/generate_serialization.py

Generate for a specific source/target pair:

python3 scripts/generate_serialization.py \
    --source src/include/duckdb/storage/serialization \
    --target src/storage/serialization

Typical Workflow

  1. Edit or create a JSON spec under src/include/duckdb/storage/serialization/
  2. Assign stable numeric IDs to new fields (use the next available ID in the 200+ range for subclass fields)
  3. If the field was added in a new DuckDB version, set the "version" field
  4. Run python3 scripts/generate_serialization.py
  5. The generated serialize_*.cpp files are written to the target directory

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment