Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Duckdb Duckdb Serialization Code Generation

From Leeroopedia


Overview

Generating type-safe serialization and deserialization code from declarative schema definitions. Instead of hand-writing Serialize() and Deserialize() methods for every class in a complex type hierarchy, a code generator reads JSON specifications that describe each class's fields, their types, serialization order, and version information, then produces the corresponding C++ methods.

Description

Auto-generating Serialize/Deserialize methods for complex type hierarchies (expressions, logical operators, parsed expressions, table references, constraints, and more) from JSON specifications. This avoids manual serialization code maintenance and the subtle bugs that come with it.

DuckDB has dozens of serializable classes organized in deep inheritance hierarchies. Each class may have:

  • Typed member fields with explicit serialization IDs for backward compatibility
  • Base class relationships requiring polymorphic dispatch during deserialization
  • Version-tagged fields that were added, modified, or removed across DuckDB releases
  • Constructor requirements specifying which deserialized fields to pass to the constructor

The code generation approach provides several guarantees:

  1. Consistent serialization IDs -- each field has a numeric ID that remains stable across versions, enabling forward/backward compatible binary formats
  2. Version-aware field handling -- the MemberVariableStatus enum tracks whether a field is EXISTING (read/write), READ_ONLY (deserialize only, with default), or DELETED (skip entirely)
  3. Polymorphic dispatch -- base classes with a class_type discriminator automatically get switch-based deserialization that delegates to the correct subclass
  4. Type-safe code -- the generator knows about container types, pointer types, and move semantics, producing code that uses std::move, unique_ptr_cast, and appropriate default values

Usage

This principle applies when adding new serializable types or modifying existing type fields. Specific scenarios include:

  • Adding a new serializable class -- add a JSON entry with the class name, base class, enum discriminator, and member list; re-run the generator
  • Adding a field to an existing class -- append a new member with the next available ID and a version tag; the generator handles backward compatibility
  • Removing a field -- mark the member's status as "deleted" rather than removing it; the generator produces code that skips the field during deserialization
  • Changing the serialization format -- update the JSON spec and re-generate; the version map ensures old formats can still be read

Theoretical Basis

  • Schema-driven code generation -- JSON schemas are the authoritative source for serialization layout; generated code is always consistent with the schema
  • Backward compatibility in serialization -- numeric field IDs and version maps enable reading data written by older DuckDB versions, even when fields have been added or removed
  • Visitor / switch dispatch pattern -- polymorphic deserialization uses a generated switch statement over the type discriminator, avoiding virtual dispatch overhead
  • Single source of truth -- the JSON spec defines field order, types, defaults, and version ranges in one place; the generated C++ code is a derived artifact

Related

Implementation:Duckdb_Duckdb_Generate_Serialization

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment