Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Duckdb Duckdb Benchmark Execution

From Leeroopedia
Revision as of 11:00, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/Duckdb_Duckdb_Benchmark_Execution.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)



Knowledge Sources
Domains Database_Engineering, Performance_Testing, Benchmarking
Last Updated 2026-02-07 11:00 GMT

Overview

End-to-end process for running performance benchmarks against DuckDB, including micro-benchmarks, TPC-H/TPC-DS standard benchmarks, and regression detection.

Description

This workflow covers compiling DuckDB with benchmark support, discovering and executing benchmarks, and analyzing results for performance regressions. DuckDB includes a benchmark runner that supports both compiled C++ benchmarks and an interpreted benchmark DSL (declarative .benchmark files). The system supports micro-benchmarks for individual operations, standard TPC-H and TPC-DS industry benchmarks, and the IMDB Join Order Benchmark (JOB). Results can be compared across versions to detect performance regressions in query execution time, query plan cost, storage size, extension binary size, and Python client data interchange throughput.

Usage

Execute this workflow when you need to measure DuckDB query performance, validate that code changes do not introduce performance regressions, compare execution plans between versions, or produce standardized benchmark results for evaluation against other database systems.

Execution Steps

Step 1: Build With Benchmark Support

Compile DuckDB with the benchmark runner enabled by setting BUILD_BENCHMARK=1. For TPC-H benchmarks, also enable BUILD_TPCH=1. For TPC-DS benchmarks, enable BUILD_TPCDS=1. The build produces the benchmark_runner executable in build/release/benchmark/.

Key considerations:

  • Release builds should be used for accurate performance measurement
  • The IMDB JOB benchmark data is compiled into the benchmark binary as embedded constants
  • TPC-H and TPC-DS query sets and schema definitions are embedded via generated C++ headers

Step 2: Discover Available Benchmarks

Use the benchmark runner's discovery mechanism to list all available benchmarks. The runner recursively scans the benchmark/ directory for .benchmark files and also registers compiled C++ benchmarks. Each benchmark has metadata including display name, group, and subgroup.

Key considerations:

  • Use --list flag to enumerate all benchmarks
  • Use --info flag to get metadata about a specific benchmark
  • Regex patterns can filter which benchmarks to run
  • Benchmarks are organized into groups: micro, tpch, tpcds, imdb

Step 3: Execute Benchmarks

Run selected benchmarks through the benchmark runner. For each benchmark, the runner executes the setup phase (creating tables, loading data), then performs multiple timed iterations of the query. The runner supports timeout configuration, hot-run vs cold-run modes, and can output profiling information including query plans.

Key considerations:

  • Output format is CSV with columns: name, run number, timing
  • The --out flag writes raw timings to a file
  • The --profile flag outputs a pretty-printed query tree
  • The --query flag shows the SQL being benchmarked
  • Benchmarks can be interrupted via timeout mechanism

Step 4: Collect And Compare Results

Gather benchmark timing results and compare them against a baseline run to detect regressions. The regression checking system compares old and new timing files, flagging significant performance differences. Additional regression checks cover query plan cost (intermediate cardinalities), storage size, extension binary size, and Python client performance.

Key considerations:

  • regression_check.py compares timing results between two runs
  • plan_cost_runner.py detects changes in query plan cardinality estimates
  • regression_test_storage_size.py checks for storage size regressions
  • regression_test_extension_size.py checks for binary size regressions
  • regression_test_python.py benchmarks Python client data interchange paths

Step 5: Analyze And Report

Review regression results and benchmark profiles to identify performance bottlenecks or improvements. Use query profiling to understand execution plan changes, and compare intermediate cardinalities to validate optimizer behavior. Generate reports suitable for CI integration or manual review.

Key considerations:

  • Query profiles show operator-level timing breakdown
  • Plan cost comparison reveals optimizer regression
  • Results should be interpreted in the context of hardware and system load
  • CI integration runs benchmarks automatically on pull requests

Execution Diagram

GitHub URL

Workflow Repository