Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Duckdb Duckdb Core Library Compilation

From Leeroopedia


Overview

Building the main library artifact from organized source modules is a foundational step in the DuckDB compilation pipeline. This principle governs how disparate source files, organized into logical modules, are compiled and assembled into a single linkable library (shared or static) that downstream targets can consume.

Description

The DuckDB codebase is organized into a set of well-defined source modules, each responsible for a distinct area of the database engine. These modules include:

  • catalog -- metadata and schema management
  • common -- shared utilities, types, and data structures
  • core_functions -- built-in scalar and aggregate functions
  • execution -- physical execution engine and operators
  • function -- function registry and binding
  • main -- database lifecycle, client context, and configuration
  • optimizer -- query plan optimization passes
  • parallel -- parallel execution and task scheduling
  • parser -- SQL parsing and transformation
  • planner -- logical plan generation
  • storage -- persistent storage and buffer management
  • transaction -- transaction management and MVCC

To compile these modules efficiently, DuckDB employs a unity build strategy. In a unity build, multiple .cpp source files are combined into a single translation unit before compilation. This dramatically reduces:

  1. Compilation time -- the compiler processes fewer translation units, reducing redundant header parsing and template instantiation.
  2. Link time -- fewer object files means the linker has less work to do.
  3. Redundant work -- shared headers are parsed once per unity chunk rather than once per source file.

The build system produces two primary library targets:

Target Type Description
duckdb Shared library (.so / .dylib / .dll) Dynamically linkable library for embedding
duckdb_static Static library (.a / .lib) Statically linkable archive for self-contained builds

Usage

This principle applies after all third-party dependencies have been compiled (e.g., fmt, re2, hyperloglog, miniz, pg_query, mbedtls). Once the dependency object files are available, the core library compilation step:

  1. Gathers all source files from each module directory under src/.
  2. Optionally groups them into unity build chunks (unless DISABLE_UNITY is set).
  3. Compiles them into object files.
  4. Links the object files together with dependency objects into the final shared and/or static library.

Theoretical Basis

Unity Build Optimization

A unity build (also called a jumbo build or amalgamation build) compiles multiple source files as a single translation unit by #include-ing them into a generated source file. The key theoretical benefits are:

  • Reduced redundant parsing -- headers included by multiple source files are parsed only once per unity chunk.
  • Improved inlining opportunities -- the compiler can see more code within a single translation unit, enabling better inlining decisions.
  • Fewer I/O operations -- the build system opens and reads fewer files.

Link-Time Optimization (LTO)

When combined with link-time optimization, the unity build approach allows the compiler and linker to perform whole-program analysis across module boundaries, further improving the generated binary through:

  • Cross-module inlining
  • Dead code elimination
  • Interprocedural constant propagation

Shared vs. Static Linking

The dual-target approach (shared and static) serves different deployment scenarios:

  • Shared libraries enable smaller binaries and allow library updates without recompilation of consumers.
  • Static libraries produce fully self-contained binaries with no runtime dependency resolution.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment