Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Duckdb Duckdb Auxiliary Code Generation

From Leeroopedia


Overview

This principle covers generating auxiliary code including profiling metric enums, embedded test data, and compile-time constants. DuckDB uses a collection of smaller code generation scripts to produce supporting infrastructure that does not fall under the larger function registration or settings generation categories.

Description

The Auxiliary Code Generation principle governs a collection of smaller code generation tasks that produce various supporting artifacts for the DuckDB engine:

  1. Profiling metric enums -- A generator reads a JSON specification of profiling metric types and produces C++ enum definitions along with helper functions for converting between metric names and enum values. This ensures the profiling infrastructure has a consistent, auto-generated set of metric identifiers.
  2. Embedded test data -- Generators read SQL query files (TPC-H and TPC-DS workloads) and embed them as C++ string constants. This allows the test and benchmark infrastructure to reference standard queries without runtime file I/O.
  3. TPC-DS schema metadata -- A generator produces C++ headers describing the TPC-DS schema (table names, column definitions) for use in the TPC-DS benchmark harness.
  4. Vector size constants -- A generator produces a C++ header defining compile-time constants for the DuckDB vector size, which governs the batch size used throughout the vectorized execution engine.

Each of these tasks follows the same overarching pattern: a declarative or file-based input is transformed by a Python script into C++ code that is compiled into the engine.

Usage

Apply this principle when modifying profiling metrics, updating test queries, or changing vector size constants:

  • When a new profiling metric is introduced, update the metric type JSON and re-run the metric enum generator.
  • When TPC-H or TPC-DS queries are updated, re-run the CSV header and TPC-DS generators to refresh the embedded C++ strings.
  • When the default vector size needs to change (e.g., for performance experiments), update the input and re-run the vector size generator.

Theoretical Basis

  • Compile-time code embedding -- SQL queries and other text artifacts are embedded as C++ string literals at compile time, eliminating runtime file dependencies and ensuring reproducible benchmarks.
  • Constant generation -- Values such as vector sizes and enum discriminants are generated from a single source of truth, preventing accidental inconsistencies when the same constant is referenced in multiple places.
  • Test data embedding -- Standard benchmark queries (TPC-H, TPC-DS) are baked into the binary, allowing benchmark and test harnesses to run without external data files.

Related

Implementation:Duckdb_Duckdb_Generate_Auxiliary

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment