Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb Test Compile Py

From Leeroopedia


Overview

Concrete tool for verifying that DuckDB's amalgamated source compiles independently. The test_compile.py script invokes a C++ compiler on the amalgamated source files to confirm they are syntactically correct, include-complete, and free of symbol resolution errors -- all without the original DuckDB build system.

Code Reference

Source Location
scripts/test_compile.py (lines 1--86)

Key Functions

Function Signature Purpose
get_git_hash get_git_hash() Returns the current git HEAD commit hash. Used to determine whether the compilation cache is still valid.
try_compilation try_compilation(fpath, cache) Attempts to compile a single source file using clang++ -std=c++17 -S -O0. Records success in the cache. Returns True on success, False on failure.
compile_dir compile_dir(dir, cache) Recursively walks a directory and compiles every .cpp file found. Skips files already in the cache (when resuming).

Compilation Command

The script uses the following compilation command internally:

clang++ -std=c++17 -S -O0 -o /dev/null <source_file>
  • -std=c++17 -- DuckDB requires C++17
  • -S -- compile to assembly only (no object file, no linking)
  • -O0 -- no optimization (fastest compilation for validation purposes)
  • -o /dev/null -- discard the assembly output

Caching Mechanism

The script uses Python's pickle module to maintain a compilation cache (amalgamation.cache). This cache stores:

  • The git commit hash at the time of caching
  • A set of file paths that compiled successfully

The cache has three resume modes:

Mode Constant Behavior
Auto RESUME_AUTO Resume from cache only if the current git hash matches the cached hash. Otherwise, start fresh.
Always RESUME_ALWAYS Always resume from cache, regardless of git hash. Useful during iterative development.
Never RESUME_NEVER Ignore any existing cache and recompile everything from scratch.

I/O Contract

Command-Line Interface

python3 scripts/test_compile.py [OPTIONS]

Options:
  --resume     Use RESUME_ALWAYS mode (resume from cache regardless of commit)
  --restart    Use RESUME_NEVER mode (ignore cache, recompile everything)

Default (no flags): RESUME_AUTO mode

External Dependencies

Dependency Version Purpose
python3 3.7+ Script runtime
clang++ or g++ C++17 support required Compilation verification

Inputs

  • Amalgamated source: src/amalgamation/duckdb.cpp
  • Amalgamated header: src/amalgamation/duckdb.hpp
  • Cache file (optional): amalgamation.cache (pickle format)

Outputs

Output Description
Exit code 0 All files compiled successfully; the amalgamation is valid.
Exit code non-zero One or more files failed to compile; error messages are printed to stderr.
amalgamation.cache Updated cache file recording which files compiled successfully.

Usage Examples

Basic Validation

# After creating the amalgamation, validate it:
python3 scripts/amalgamation.py
python3 scripts/test_compile.py

echo "Exit code: $?"
# 0 = success, non-zero = failure

Force Full Recompilation

# Ignore any cached results and recompile everything
python3 scripts/test_compile.py --restart

Resume from Cache (Iterative Development)

# When iterating on amalgamation fixes, resume from where you left off
# (even if the commit hash has changed)
python3 scripts/test_compile.py --resume

CI Pipeline Integration

#!/usr/bin/env bash
set -euo pipefail

# Full validation pipeline in CI
python3 scripts/amalgamation.py --extended
python3 scripts/test_compile.py --restart

if [ $? -eq 0 ]; then
    echo "Amalgamation validation PASSED"
else
    echo "Amalgamation validation FAILED" >&2
    exit 1
fi

Using g++ Instead of clang++

# The script defaults to clang++. To use g++, set the CXX environment variable
# (if the script supports it) or modify the script:
CXX=g++ python3 scripts/test_compile.py

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment