Implementation:Duckdb Duckdb Generate Auxiliary
Overview
Concrete tool for generating DuckDB auxiliary code including profiling metrics, embedded test queries, and compile-time constants. This implementation comprises five Python scripts, each responsible for a distinct code generation task that produces supporting C++ artifacts for the DuckDB engine.
Code Reference
| Script | Source Location | Lines | Purpose |
|---|---|---|---|
generate_metric_enums.py |
scripts/generate_metric_enums.py |
L1-52 | Profiling metric enum generation |
generate_csv_header.py |
scripts/generate_csv_header.py |
L1-98 | Embed SQL queries as C++ strings |
generate_tpcds_schema.py |
scripts/generate_tpcds_schema.py |
L1-116 | TPC-DS schema metadata generation |
generate_tpcds_results.py |
scripts/generate_tpcds_results.py |
L1-137 | TPC-DS expected results generation |
generate_vector_sizes.py |
scripts/generate_vector_sizes.py |
L1-21 | Vector size constant generation |
generate_metric_enums.py
Reads a JSON specification of profiling metric types (metric_type.json) and generates C++ enum definitions along with string-to-enum and enum-to-string conversion functions. The output files (metric_type.hpp and metric_type.cpp) provide the profiling subsystem with a consistent set of metric identifiers.
generate_csv_header.py
Reads SQL query files and embeds them as C++ string constants in header and source files (profiling_utils.hpp and profiling_utils.cpp). This allows the profiling and benchmark infrastructure to reference standard queries at compile time without runtime file I/O.
generate_tpcds_schema.py
Produces C++ headers describing the TPC-DS schema, including table names and column definitions. These headers are consumed by the TPC-DS benchmark harness to set up test databases and validate query results.
generate_tpcds_results.py
Generates C++ headers containing the expected results for TPC-DS benchmark queries. These are used to verify that query execution produces correct output during testing.
generate_vector_sizes.py
Produces a C++ header defining compile-time constants for the DuckDB vector size. The vector size governs the number of tuples processed per batch in the vectorized execution engine and is referenced throughout the codebase.
I/O Contract
Inputs
| Input | Description | Consumed By |
|---|---|---|
metric_type.json |
JSON specification of profiling metric types | generate_metric_enums.py
|
| TPC-H / TPC-DS SQL files | Standard benchmark SQL query files | generate_csv_header.py, generate_tpcds_results.py
|
| TPC-DS schema definitions | Schema metadata for TPC-DS tables | generate_tpcds_schema.py
|
| Profiling utils templates | Template files for profiling utility generation | generate_csv_header.py
|
| Vector size configuration | Input specifying the desired vector size constant | generate_vector_sizes.py
|
Outputs
| Output File(s) | Description | Produced By |
|---|---|---|
metric_type.hpp, metric_type.cpp |
C++ enum definitions and conversion functions for profiling metrics | generate_metric_enums.py
|
profiling_utils.hpp, profiling_utils.cpp |
C++ headers and source with embedded SQL query strings | generate_csv_header.py
|
| TPC-DS schema headers | C++ headers describing TPC-DS table and column definitions | generate_tpcds_schema.py
|
| TPC-DS results headers | C++ headers containing expected TPC-DS query results | generate_tpcds_results.py
|
| Vector size header | C++ header defining the vector size compile-time constant | generate_vector_sizes.py
|
External Dependencies
- python3 -- All five scripts require a Python 3 interpreter.
Usage Examples
Generating profiling metric enums:
python3 scripts/generate_metric_enums.py
Generating embedded SQL query headers:
python3 scripts/generate_csv_header.py
Generating TPC-DS schema metadata:
python3 scripts/generate_tpcds_schema.py
Generating TPC-DS expected results:
python3 scripts/generate_tpcds_results.py
Generating vector size constants:
python3 scripts/generate_vector_sizes.py
Running all auxiliary generators as part of the build pipeline:
python3 scripts/generate_metric_enums.py
python3 scripts/generate_csv_header.py
python3 scripts/generate_tpcds_schema.py
python3 scripts/generate_tpcds_results.py
python3 scripts/generate_vector_sizes.py