Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Duckdb Duckdb Generate Auxiliary

From Leeroopedia


Overview

Concrete tool for generating DuckDB auxiliary code including profiling metrics, embedded test queries, and compile-time constants. This implementation comprises five Python scripts, each responsible for a distinct code generation task that produces supporting C++ artifacts for the DuckDB engine.

Code Reference

Script Source Location Lines Purpose
generate_metric_enums.py scripts/generate_metric_enums.py L1-52 Profiling metric enum generation
generate_csv_header.py scripts/generate_csv_header.py L1-98 Embed SQL queries as C++ strings
generate_tpcds_schema.py scripts/generate_tpcds_schema.py L1-116 TPC-DS schema metadata generation
generate_tpcds_results.py scripts/generate_tpcds_results.py L1-137 TPC-DS expected results generation
generate_vector_sizes.py scripts/generate_vector_sizes.py L1-21 Vector size constant generation

generate_metric_enums.py

Reads a JSON specification of profiling metric types (metric_type.json) and generates C++ enum definitions along with string-to-enum and enum-to-string conversion functions. The output files (metric_type.hpp and metric_type.cpp) provide the profiling subsystem with a consistent set of metric identifiers.

generate_csv_header.py

Reads SQL query files and embeds them as C++ string constants in header and source files (profiling_utils.hpp and profiling_utils.cpp). This allows the profiling and benchmark infrastructure to reference standard queries at compile time without runtime file I/O.

generate_tpcds_schema.py

Produces C++ headers describing the TPC-DS schema, including table names and column definitions. These headers are consumed by the TPC-DS benchmark harness to set up test databases and validate query results.

generate_tpcds_results.py

Generates C++ headers containing the expected results for TPC-DS benchmark queries. These are used to verify that query execution produces correct output during testing.

generate_vector_sizes.py

Produces a C++ header defining compile-time constants for the DuckDB vector size. The vector size governs the number of tuples processed per batch in the vectorized execution engine and is referenced throughout the codebase.

I/O Contract

Inputs

Input Description Consumed By
metric_type.json JSON specification of profiling metric types generate_metric_enums.py
TPC-H / TPC-DS SQL files Standard benchmark SQL query files generate_csv_header.py, generate_tpcds_results.py
TPC-DS schema definitions Schema metadata for TPC-DS tables generate_tpcds_schema.py
Profiling utils templates Template files for profiling utility generation generate_csv_header.py
Vector size configuration Input specifying the desired vector size constant generate_vector_sizes.py

Outputs

Output File(s) Description Produced By
metric_type.hpp, metric_type.cpp C++ enum definitions and conversion functions for profiling metrics generate_metric_enums.py
profiling_utils.hpp, profiling_utils.cpp C++ headers and source with embedded SQL query strings generate_csv_header.py
TPC-DS schema headers C++ headers describing TPC-DS table and column definitions generate_tpcds_schema.py
TPC-DS results headers C++ headers containing expected TPC-DS query results generate_tpcds_results.py
Vector size header C++ header defining the vector size compile-time constant generate_vector_sizes.py

External Dependencies

  • python3 -- All five scripts require a Python 3 interpreter.

Usage Examples

Generating profiling metric enums:

python3 scripts/generate_metric_enums.py

Generating embedded SQL query headers:

python3 scripts/generate_csv_header.py

Generating TPC-DS schema metadata:

python3 scripts/generate_tpcds_schema.py

Generating TPC-DS expected results:

python3 scripts/generate_tpcds_results.py

Generating vector size constants:

python3 scripts/generate_vector_sizes.py

Running all auxiliary generators as part of the build pipeline:

python3 scripts/generate_metric_enums.py
python3 scripts/generate_csv_header.py
python3 scripts/generate_tpcds_schema.py
python3 scripts/generate_tpcds_results.py
python3 scripts/generate_vector_sizes.py

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment