Implementation:Duckdb Duckdb Generate Serialization
Overview
Concrete tool for generating DuckDB serialization and deserialization code from JSON schema definitions. The script scripts/generate_serialization.py reads JSON files that describe serializable class hierarchies, validates versioning, and produces C++ source files containing Serialize() and Deserialize() methods for each class.
Code Reference
| Field | Value |
|---|---|
| Source | scripts/generate_serialization.py (lines 1--858)
|
| Language | Python 3 |
| API | python3 scripts/generate_serialization.py [--source dir] [--target dir]
|
| External Dependencies | python3, json (stdlib), re (stdlib), argparse (stdlib), enum (stdlib)
|
Key Classes and Enums
class MemberVariableStatus(Enum):
EXISTING = 1 # Both serialized and deserialized
READ_ONLY = 2 # Not serialized, but is deserialized (requires default)
DELETED = 3 # Not serialized, not deserialized
I/O Contract
Default Source/Target Directories
When invoked without arguments, the script processes three directory pairs:
| Source (JSON specs) | Target (generated .cpp) |
|---|---|
src/include/duckdb/storage/serialization/ |
src/storage/serialization/
|
extension/parquet/include/ |
extension/parquet/
|
extension/json/include/ |
extension/json/
|
Custom directories can be specified via --source and --target flags.
Input: Serialization JSON Schema
Each JSON file defines an array of serializable classes:
[
{
"class": "Constraint",
"class_type": "type",
"includes": ["duckdb/parser/constraints/list.hpp"],
"members": [
{
"id": 100,
"name": "type",
"type": "ConstraintType"
}
]
},
{
"class": "NotNullConstraint",
"base": "Constraint",
"enum": "NOT_NULL",
"members": [
{
"id": 200,
"name": "index",
"type": "LogicalIndex"
}
],
"constructor": ["index"]
}
]
| JSON Field | Type | Description |
|---|---|---|
class |
string | C++ class name |
base |
string (optional) | Base class name for inheritance hierarchy |
class_type |
string (optional) | Name of the discriminator field for polymorphic dispatch |
enum |
string (optional) | Enum value for this subclass in the discriminator |
members[].id |
integer | Stable numeric serialization ID for the field |
members[].name |
string | C++ member variable name |
members[].type |
string | C++ type (supports containers, pointers, primitives) |
members[].property |
string (optional) | Accessor name if different from name
|
members[].default |
value (optional) | Default value for backward-compatible deserialization |
members[].version |
string (optional) | DuckDB version when the field was introduced |
members[].status |
string (optional) | One of "existing", "read_only", "deleted"
|
constructor |
array (optional) | List of member names to pass to the constructor |
Version Map
The script reads src/storage/version_map.json to resolve version strings to serialization version constants. This ensures that version-tagged fields map to the correct SerializationVersion enum values.
Output Files
For each input JSON file foo.json, the script produces serialize_foo.cpp in the target directory. Each generated file contains:
- A file header warning that it is auto-generated
- Include directives for all referenced headers
Serialize()methods usingserializer.WritePropertycallsDeserialize()methods usingdeserializer.ReadPropertycalls- Switch-based polymorphic dispatch for base classes
| Input JSON | Generated Output |
|---|---|
src/include/duckdb/storage/serialization/constraint.json |
src/storage/serialization/serialize_constraint.cpp
|
src/include/duckdb/storage/serialization/expression.json |
src/storage/serialization/serialize_expression.cpp
|
src/include/duckdb/storage/serialization/logical_operator.json |
src/storage/serialization/serialize_logical_operator.cpp
|
src/include/duckdb/storage/serialization/parsed_expression.json |
src/storage/serialization/serialize_parsed_expression.cpp
|
Generated Code Pattern
// Serialize method
void NotNullConstraint::Serialize(Serializer &serializer) const {
Constraint::Serialize(serializer);
serializer.WriteProperty<LogicalIndex>(200, "index", index);
}
// Deserialize method (base class with polymorphic dispatch)
unique_ptr<Constraint> Constraint::Deserialize(Deserializer &deserializer) {
auto type = deserializer.ReadProperty<ConstraintType>(100, "type");
unique_ptr<Constraint> result;
switch (type) {
case ConstraintType::NOT_NULL:
result = NotNullConstraint::Deserialize(deserializer);
break;
case ConstraintType::CHECK:
result = CheckConstraint::Deserialize(deserializer);
break;
default:
throw SerializationException("Unsupported type for deserialization of Constraint!");
}
return std::move(result);
}
Key Helper Functions
| Function | Purpose |
|---|---|
get_file_list() |
Discovers JSON specs and maps them to output paths |
lookup_serialization_version() |
Resolves a DuckDB version string to a serialization version constant |
is_container(type) |
Checks if a type is a template container (contains <)
|
is_pointer(type) |
Checks if a type is a pointer or shared_ptr
|
requires_move(type) |
Determines if a deserialized value needs std::move
|
replace_pointer(type) |
Converts raw pointer types to unique_ptr wrappers
|
parse_status(status) |
Converts a status string to MemberVariableStatus enum
|
Usage Examples
Generate all serialization code with default directories:
python3 scripts/generate_serialization.py
Generate for a specific source/target pair:
python3 scripts/generate_serialization.py \
--source src/include/duckdb/storage/serialization \
--target src/storage/serialization
Typical Workflow
- Edit or create a JSON spec under
src/include/duckdb/storage/serialization/ - Assign stable numeric IDs to new fields (use the next available ID in the 200+ range for subclass fields)
- If the field was added in a new DuckDB version, set the
"version"field - Run
python3 scripts/generate_serialization.py - The generated
serialize_*.cppfiles are written to the target directory