Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Duckdb Duckdb Prerequisite Code Generation

From Leeroopedia


Overview

Ensuring all generated source files are up-to-date before packaging or amalgamation. DuckDB relies on a suite of Python-based code generation scripts that produce C and C++ source files from JSON specifications, grammar fragments, enum headers, and template files. These generated files must be current before any amalgamation or source packaging step can proceed.

Description

The DuckDB build process depends on multiple code generation scripts, each responsible for producing specific categories of source files:

Generator Script Purpose Output Category
scripts/generate_c_api.py C API header generation Public C API headers
scripts/generate_enum.py Enum class generation Enum definitions from JSON specs
scripts/generate_serialization.py Serialization/deserialization code Serialization routines
scripts/generate_grammar.py SQL grammar production rules Parser grammar (bison/flex)
scripts/generate_functions.py Built-in function registration Function catalog entries
scripts/generate_settings.py Configuration settings code Settings registration
scripts/generate_metrics.py Profiling metrics definitions Metrics enumeration and helpers

All of these scripts must be executed successfully before the amalgamation script (scripts/amalgamation.py) or the package build script (scripts/package_build.py) can produce correct output. If any generator is skipped or fails, the resulting amalgamated source will be missing generated code, leading to compilation failures downstream.

The generation step enforces a strict prerequisite ordering: generation runs first, amalgamation runs second, and packaging runs third. This ordering is encoded in CI workflows and Makefile targets.

Usage

This principle applies in the following scenarios:

  • Before creating an amalgamated source file -- the amalgamation script reads from src/ and src/include/, which contain generated files. These must be fresh.
  • Before building a source package -- the package build script calls amalgamation internally, so generators must have run.
  • As the first step in the packaging pipeline -- CI workflows (e.g., .github/workflows/) invoke all generators before any packaging step.
  • During local development -- developers modifying JSON specs, grammar files, or function definitions must re-run the relevant generators before building.
# Typical invocation order in CI or local builds:
python3 scripts/generate_c_api.py
python3 scripts/generate_enum.py
python3 scripts/generate_serialization.py
python3 scripts/generate_grammar.py
python3 scripts/generate_functions.py
python3 scripts/generate_settings.py
python3 scripts/generate_metrics.py

# Only after all generators succeed:
python3 scripts/amalgamation.py
python3 scripts/package_build.py

Theoretical Basis

This principle is rooted in two foundational concepts:

Build Prerequisite Ordering
In any build system, tasks that produce inputs for downstream tasks must complete before those downstream tasks begin. Code generation produces .cpp and .hpp files that amalgamation reads; therefore, generation is a strict prerequisite of amalgamation.
Dependency-Driven Generation
Each generator script reads from a well-defined set of input files (JSON specs, grammar fragments, templates) and writes to a well-defined set of output files. This makes the dependency graph explicit and deterministic. A change to any input file necessitates re-running the corresponding generator to bring outputs up-to-date.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment