Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Duckdb Duckdb Source Amalgamation And Packaging

From Leeroopedia


Knowledge Sources
Domains Database_Engineering, Build_Systems, Distribution
Last Updated 2026-02-07 11:00 GMT

Overview

End-to-end process for creating distributable DuckDB source packages, including single-header/single-source amalgamations and self-contained source archives for language-specific client builds.

Description

This workflow produces two types of distributable DuckDB source packages. The first is the amalgamation: a single C++ header file (duckdb.hpp) and a single C++ source file (duckdb.cpp) that contain the entire DuckDB codebase, making it trivial to embed DuckDB in other projects by including just two files. The second is the package build: a self-contained source archive that includes all headers, source files, third-party dependencies, and build configuration needed to compile DuckDB as part of a client library build (Python wheels, R packages, etc.). Both packaging approaches resolve and inline all include dependencies to produce fully self-contained outputs.

Usage

Execute this workflow when preparing DuckDB for distribution as an embeddable library, building language-specific client packages (Python, R, Java), creating release artifacts, or when a downstream project requires a single-file DuckDB integration.

Execution Steps

Step 1: Run Code Generation

Ensure all generated source files are up to date before packaging. This includes C API headers, enum utilities, serialization code, grammar files, and function registrations. The packaging scripts depend on these generated files being present and current.

Key considerations:

  • All generate_*.py scripts must be run before packaging
  • Generated files are not committed to the repository in some cases
  • Out-of-date generated files will produce incorrect packages

Step 2: Create Amalgamation

Run the amalgamation script to combine all DuckDB source files into a single header and a single source file. The script resolves all include directives, inlines header contents in dependency order, and handles conditional compilation. An extended mode includes additional headers for embedding scenarios that need access to internal APIs.

Key considerations:

  • Output files: src/amalgamation/duckdb.hpp and src/amalgamation/duckdb.cpp
  • The --extended flag includes additional internal headers
  • Include resolution follows the dependency graph to avoid duplicate definitions
  • Skip flags control whether DuckDB-internal includes are resolved

Step 3: Build Source Package

Run the package build script to create a self-contained source directory with all files needed for compilation. The script collects core source files, headers, third-party dependency sources and headers, and generates a CMakeLists.txt appropriate for the target client library. Excluded files (like utf8proc_data.cpp and dummy loaders) are filtered out.

Key considerations:

  • Third-party includes cover 20+ vendored libraries
  • Third-party sources are collected from their respective directories
  • The generated CMakeLists.txt configures include paths and compile targets
  • Platform-specific files are included for cross-platform compatibility

Step 4: Validate Package

Test that the generated source package compiles correctly by building it with the target client's build system. For the amalgamation, verify that the single-source compilation produces a working DuckDB library. For source packages, verify compilation with test_compile.py which checks that every individual source file compiles independently.

Key considerations:

  • test_compile.py verifies independent compilation of each source file
  • Missing includes are caught by independent compilation checks
  • The package should produce identical functionality to a normal build
  • Size and symbol checks validate the package is complete

Step 5: Upload Release Artifacts

Upload the compiled packages and binaries as release artifacts. For tagged releases, assets are uploaded to GitHub Releases via the GitHub API. For CI builds, artifacts are uploaded to staging S3 buckets organized by git commit hash and architecture. Python wheels are published to PyPI using the release-pip.py tool.

Key considerations:

  • asset-upload-gha.py handles GitHub Actions release uploads
  • upload-s3.py uploads to the WebDAV-based download server
  • upload-assets-to-staging.sh handles staging S3 uploads with safety guards
  • release-pip.py automates PyPI wheel publishing
  • macOS binaries require code signing via imported certificates

Execution Diagram

GitHub URL

Workflow Repository