Principle:Duckdb Duckdb Enum Code Generation
Overview
Generating string conversion utilities and validation code from enum type definitions. Instead of manually maintaining ToString and FromString functions for every C++ enum class, a code generator scans enum definitions in header files and produces all conversion code automatically.
Description
Auto-generating enum-to-string and string-to-enum conversion functions to avoid manual maintenance, reduce bugs, and keep enum handling consistent across the entire codebase.
In C++ projects with many enum types, each enum typically needs:
- A ToChars / ToString function that converts an enum value to its string representation
- A FromString function that parses a string back into the enum value
- Proper error handling for unknown values
Maintaining these by hand is error-prone: adding a new enum member requires updating conversion functions in a separate file, and forgetting to do so causes runtime failures rather than compile-time errors.
The DuckDB approach solves this by:
- Scanning all header files under
src/forenum classdeclarations using regex parsing - Extracting enum members including their values and handling duplicate/alias values
- Applying overrides for enums where the string representation differs from the member name (e.g.,
SQLNULLmaps to"NULL",TIMESTAMP_TZmaps to"TIMESTAMP WITH TIME ZONE") - Generating both a header and source file with template specializations of
EnumUtil::ToCharsandEnumUtil::FromString - Blacklisting enums that should not have generated conversions (e.g., internal-only enums)
A companion script handles JSON-defined enums (for extensions) using a similar approach but reads from JSON specifications rather than scanning C++ headers.
Usage
This principle applies when C++ enum types need string serialization/deserialization. Specific scenarios include:
- Adding a new enum member -- re-run the generator; the new member automatically gets string conversion support
- Adding a new enum class -- if it follows the
enum class Name : Type { ... }pattern in a header undersrc/, it is discovered automatically - Customizing string representation -- add an entry to the
overridesdictionary in the generator script - Excluding an enum -- add the enum name to the
blacklistin the generator script
Theoretical Basis
- DRY principle -- the enum definition in the header is the single source of truth; conversion code is derived from it rather than duplicated
- Single source of truth -- every enum value and its string form are defined in exactly one place, eliminating consistency bugs
- Convention over configuration -- the generator discovers enums by convention (scanning headers) rather than requiring explicit registration, with overrides only for exceptions
- Template specialization pattern -- the generated code uses C++ template specializations of
EnumUtil::ToChars<T>andEnumUtil::FromString<T>, providing a uniform API for all enum types