Principle:Duckdb Duckdb Core Library Compilation
Overview
Building the main library artifact from organized source modules is a foundational step in the DuckDB compilation pipeline. This principle governs how disparate source files, organized into logical modules, are compiled and assembled into a single linkable library (shared or static) that downstream targets can consume.
Description
The DuckDB codebase is organized into a set of well-defined source modules, each responsible for a distinct area of the database engine. These modules include:
- catalog -- metadata and schema management
- common -- shared utilities, types, and data structures
- core_functions -- built-in scalar and aggregate functions
- execution -- physical execution engine and operators
- function -- function registry and binding
- main -- database lifecycle, client context, and configuration
- optimizer -- query plan optimization passes
- parallel -- parallel execution and task scheduling
- parser -- SQL parsing and transformation
- planner -- logical plan generation
- storage -- persistent storage and buffer management
- transaction -- transaction management and MVCC
To compile these modules efficiently, DuckDB employs a unity build strategy. In a unity build, multiple .cpp source files are combined into a single translation unit before compilation. This dramatically reduces:
- Compilation time -- the compiler processes fewer translation units, reducing redundant header parsing and template instantiation.
- Link time -- fewer object files means the linker has less work to do.
- Redundant work -- shared headers are parsed once per unity chunk rather than once per source file.
The build system produces two primary library targets:
| Target | Type | Description |
|---|---|---|
duckdb |
Shared library (.so / .dylib / .dll) |
Dynamically linkable library for embedding |
duckdb_static |
Static library (.a / .lib) |
Statically linkable archive for self-contained builds |
Usage
This principle applies after all third-party dependencies have been compiled (e.g., fmt, re2, hyperloglog, miniz, pg_query, mbedtls). Once the dependency object files are available, the core library compilation step:
- Gathers all source files from each module directory under
src/. - Optionally groups them into unity build chunks (unless
DISABLE_UNITYis set). - Compiles them into object files.
- Links the object files together with dependency objects into the final shared and/or static library.
Theoretical Basis
Unity Build Optimization
A unity build (also called a jumbo build or amalgamation build) compiles multiple source files as a single translation unit by #include-ing them into a generated source file. The key theoretical benefits are:
- Reduced redundant parsing -- headers included by multiple source files are parsed only once per unity chunk.
- Improved inlining opportunities -- the compiler can see more code within a single translation unit, enabling better inlining decisions.
- Fewer I/O operations -- the build system opens and reads fewer files.
Link-Time Optimization (LTO)
When combined with link-time optimization, the unity build approach allows the compiler and linker to perform whole-program analysis across module boundaries, further improving the generated binary through:
- Cross-module inlining
- Dead code elimination
- Interprocedural constant propagation
The dual-target approach (shared and static) serves different deployment scenarios:
- Shared libraries enable smaller binaries and allow library updates without recompilation of consumers.
- Static libraries produce fully self-contained binaries with no runtime dependency resolution.