Workflow:Duckdb Duckdb Source Amalgamation And Packaging
| Knowledge Sources | |
|---|---|
| Domains | Database_Engineering, Build_Systems, Distribution |
| Last Updated | 2026-02-07 11:00 GMT |
Overview
End-to-end process for creating distributable DuckDB source packages, including single-header/single-source amalgamations and self-contained source archives for language-specific client builds.
Description
This workflow produces two types of distributable DuckDB source packages. The first is the amalgamation: a single C++ header file (duckdb.hpp) and a single C++ source file (duckdb.cpp) that contain the entire DuckDB codebase, making it trivial to embed DuckDB in other projects by including just two files. The second is the package build: a self-contained source archive that includes all headers, source files, third-party dependencies, and build configuration needed to compile DuckDB as part of a client library build (Python wheels, R packages, etc.). Both packaging approaches resolve and inline all include dependencies to produce fully self-contained outputs.
Usage
Execute this workflow when preparing DuckDB for distribution as an embeddable library, building language-specific client packages (Python, R, Java), creating release artifacts, or when a downstream project requires a single-file DuckDB integration.
Execution Steps
Step 1: Run Code Generation
Ensure all generated source files are up to date before packaging. This includes C API headers, enum utilities, serialization code, grammar files, and function registrations. The packaging scripts depend on these generated files being present and current.
Key considerations:
- All generate_*.py scripts must be run before packaging
- Generated files are not committed to the repository in some cases
- Out-of-date generated files will produce incorrect packages
Step 2: Create Amalgamation
Run the amalgamation script to combine all DuckDB source files into a single header and a single source file. The script resolves all include directives, inlines header contents in dependency order, and handles conditional compilation. An extended mode includes additional headers for embedding scenarios that need access to internal APIs.
Key considerations:
- Output files: src/amalgamation/duckdb.hpp and src/amalgamation/duckdb.cpp
- The --extended flag includes additional internal headers
- Include resolution follows the dependency graph to avoid duplicate definitions
- Skip flags control whether DuckDB-internal includes are resolved
Step 3: Build Source Package
Run the package build script to create a self-contained source directory with all files needed for compilation. The script collects core source files, headers, third-party dependency sources and headers, and generates a CMakeLists.txt appropriate for the target client library. Excluded files (like utf8proc_data.cpp and dummy loaders) are filtered out.
Key considerations:
- Third-party includes cover 20+ vendored libraries
- Third-party sources are collected from their respective directories
- The generated CMakeLists.txt configures include paths and compile targets
- Platform-specific files are included for cross-platform compatibility
Step 4: Validate Package
Test that the generated source package compiles correctly by building it with the target client's build system. For the amalgamation, verify that the single-source compilation produces a working DuckDB library. For source packages, verify compilation with test_compile.py which checks that every individual source file compiles independently.
Key considerations:
- test_compile.py verifies independent compilation of each source file
- Missing includes are caught by independent compilation checks
- The package should produce identical functionality to a normal build
- Size and symbol checks validate the package is complete
Step 5: Upload Release Artifacts
Upload the compiled packages and binaries as release artifacts. For tagged releases, assets are uploaded to GitHub Releases via the GitHub API. For CI builds, artifacts are uploaded to staging S3 buckets organized by git commit hash and architecture. Python wheels are published to PyPI using the release-pip.py tool.
Key considerations:
- asset-upload-gha.py handles GitHub Actions release uploads
- upload-s3.py uploads to the WebDAV-based download server
- upload-assets-to-staging.sh handles staging S3 uploads with safety guards
- release-pip.py automates PyPI wheel publishing
- macOS binaries require code signing via imported certificates