Workflow:Duckdb Duckdb Benchmark Execution

Knowledge Sources	DuckDB Benchmark Guide
Domains	Database_Engineering, Performance_Testing, Benchmarking
Last Updated	2026-02-07 11:00 GMT

Overview

End-to-end process for running performance benchmarks against DuckDB, including micro-benchmarks, TPC-H/TPC-DS standard benchmarks, and regression detection.

Description

This workflow covers compiling DuckDB with benchmark support, discovering and executing benchmarks, and analyzing results for performance regressions. DuckDB includes a benchmark runner that supports both compiled C++ benchmarks and an interpreted benchmark DSL (declarative .benchmark files). The system supports micro-benchmarks for individual operations, standard TPC-H and TPC-DS industry benchmarks, and the IMDB Join Order Benchmark (JOB). Results can be compared across versions to detect performance regressions in query execution time, query plan cost, storage size, extension binary size, and Python client data interchange throughput.

Usage

Execute this workflow when you need to measure DuckDB query performance, validate that code changes do not introduce performance regressions, compare execution plans between versions, or produce standardized benchmark results for evaluation against other database systems.

Execution Steps

Step 1: Build With Benchmark Support

Compile DuckDB with the benchmark runner enabled by setting BUILD_BENCHMARK=1. For TPC-H benchmarks, also enable BUILD_TPCH=1. For TPC-DS benchmarks, enable BUILD_TPCDS=1. The build produces the benchmark_runner executable in build/release/benchmark/.

Key considerations:

Release builds should be used for accurate performance measurement
The IMDB JOB benchmark data is compiled into the benchmark binary as embedded constants
TPC-H and TPC-DS query sets and schema definitions are embedded via generated C++ headers

Step 2: Discover Available Benchmarks

Use the benchmark runner's discovery mechanism to list all available benchmarks. The runner recursively scans the benchmark/ directory for .benchmark files and also registers compiled C++ benchmarks. Each benchmark has metadata including display name, group, and subgroup.

Key considerations:

Use --list flag to enumerate all benchmarks
Use --info flag to get metadata about a specific benchmark
Regex patterns can filter which benchmarks to run
Benchmarks are organized into groups: micro, tpch, tpcds, imdb

Step 3: Execute Benchmarks

Run selected benchmarks through the benchmark runner. For each benchmark, the runner executes the setup phase (creating tables, loading data), then performs multiple timed iterations of the query. The runner supports timeout configuration, hot-run vs cold-run modes, and can output profiling information including query plans.

Key considerations:

Output format is CSV with columns: name, run number, timing
The --out flag writes raw timings to a file
The --profile flag outputs a pretty-printed query tree
The --query flag shows the SQL being benchmarked
Benchmarks can be interrupted via timeout mechanism

Step 4: Collect And Compare Results

Gather benchmark timing results and compare them against a baseline run to detect regressions. The regression checking system compares old and new timing files, flagging significant performance differences. Additional regression checks cover query plan cost (intermediate cardinalities), storage size, extension binary size, and Python client performance.

Key considerations:

regression_check.py compares timing results between two runs
plan_cost_runner.py detects changes in query plan cardinality estimates
regression_test_storage_size.py checks for storage size regressions
regression_test_extension_size.py checks for binary size regressions
regression_test_python.py benchmarks Python client data interchange paths

Step 5: Analyze And Report

Review regression results and benchmark profiles to identify performance bottlenecks or improvements. Use query profiling to understand execution plan changes, and compare intermediate cardinalities to validate optimizer behavior. Generate reports suitable for CI integration or manual review.

Key considerations:

Query profiles show operator-level timing breakdown
Plan cost comparison reveals optimizer regression
Results should be interpreted in the context of hardware and system load
CI integration runs benchmarks automatically on pull requests

Execution Diagram

GitHub URL

Workflow Repository