Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Duckdb Duckdb Source Amalgamation

From Leeroopedia


Overview

Combining multiple source files into a single compilation unit for simplified distribution and compilation. Source amalgamation is the technique by which DuckDB's hundreds of .cpp and .hpp files are merged into a single duckdb.cpp source file and a single duckdb.hpp header file, enabling end users to compile DuckDB without a build system.

Description

The amalgamation technique concatenates all DuckDB source files into two files:

  • duckdb.cpp -- a single C++ source file containing all implementation code
  • duckdb.hpp -- a single C++ header file containing all declarations

This approach provides several critical benefits:

Simplified Distribution
Instead of shipping hundreds of source files with a complex directory structure, DuckDB can be distributed as just two files. End users can add these two files to any project and compile directly.
Single-Compilation-Unit Optimizations
When the compiler sees the entire codebase in a single translation unit, it can perform whole-program optimization more effectively. Inlining decisions, dead code elimination, and interprocedural analysis all benefit from full visibility.
Elimination of Build System Requirements
End users embedding DuckDB do not need CMake, Make, or any other build system. A single compiler invocation suffices:
g++ -std=c++17 -O2 -o duckdb duckdb.cpp -lpthread -ldl
Header Deduplication
The amalgamation process tracks which headers have already been included and skips duplicate #include directives. This prevents multiple-definition errors and reduces the final file size.

How It Works

The amalgamation process follows these steps:

  1. Discover source files by parsing src/CMakeLists.txt recursively to find all .cpp files.
  2. Resolve include ordering by following #include directives depth-first, tracking which files have already been written.
  3. Concatenate source files in dependency order, replacing #include "..." with the actual file contents (for project-internal includes) and preserving #include <...> for system headers.
  4. Write the output to src/amalgamation/duckdb.cpp and src/amalgamation/duckdb.hpp.

Extended Mode

The extended amalgamation (--extended flag) includes additional modules beyond the core:

  • Parquet reader/writer -- the Apache Parquet extension
  • jemalloc allocator -- the jemalloc memory allocator for improved performance

Split Mode

For build systems that benefit from parallel compilation, the amalgamation can be split into N separate source files (--splits N), each containing a subset of the source. This preserves the distribution simplicity while enabling parallel builds.

Usage

This principle applies when:

  • Creating distributable source packages for embedding DuckDB in other projects (e.g., Python bindings, R packages, Node.js addons)
  • Preparing release artifacts that will be uploaded to GitHub Releases or package registries
  • Building header-only distributions where DuckDB is included directly in another project's source tree
  • Optimizing compilation through unity build techniques in CI pipelines

Theoretical Basis

Concept Description
Single Compilation Unit (SCU) A technique where all source files are combined into one translation unit, enabling the compiler to see and optimize the entire program at once.
Include Resolution and Ordering Topological sorting of header dependencies to ensure each header is included exactly once, in the correct order relative to its dependents.
Header Deduplication Tracking already-included headers to prevent duplicate definitions, analogous to #pragma once or include guards but applied at the amalgamation level.
Unity Builds A build technique (used in game engines and large C++ projects) where multiple source files are #include-d into a single file to reduce build times and enable cross-TU optimizations.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment