Implementation:Duckdb Duckdb Regression Check

Overview

Regression_Check is the concrete tool for detecting benchmark timing regressions provided by DuckDB's regression checking script. It is a Python script that loads two CSV files containing benchmark timing results (one from a baseline run and one from a candidate run), computes median timings for each benchmark, and reports any benchmarks where the new timing exceeds the old by more than the configured threshold. The script uses the DuckDB Python package internally to perform the comparison via SQL queries.

Code Reference

Source: scripts/regression_check.py:L1-115

The script is approximately 115 lines of Python and performs the following steps:

Parses command-line arguments to obtain paths to old and new CSV files.
Loads both CSV files into an in-memory DuckDB instance.
Computes the median timing for each benchmark in both datasets.
Joins the old and new medians by benchmark name.
Applies the dual-threshold regression criteria.
Outputs a report of any detected regressions.
Exits with code 1 if regressions are found, 0 otherwise.

Regression Threshold Logic

# A benchmark is flagged as a regression when BOTH conditions are true:
# 1. The new median is more than 10% slower than the old median
# 2. The absolute difference exceeds 0.01 seconds

The dual-threshold approach prevents false positives on very fast benchmarks where a small absolute increase (e.g., 0.001s) could appear as a large relative increase (e.g., 50%).

API

python3 scripts/regression_check.py --old <old_csv> --new <new_csv>

Argument	Type	Description
`--old`	`string` (file path)	Path to the CSV file containing baseline benchmark timing results.
`--new`	`string` (file path)	Path to the CSV file containing candidate (new) benchmark timing results.

External Dependencies

Python 3 -- The script requires Python 3.x.
DuckDB Python package -- The script uses the duckdb Python module to load CSV files and execute comparison queries. This must be installed (e.g., pip install duckdb).

I/O Contract

Inputs

Old CSV file (--old) -- A CSV file containing benchmark timing results from the baseline run. Expected columns include the benchmark name and timing value(s) for each run.
New CSV file (--new) -- A CSV file containing benchmark timing results from the candidate run, in the same format as the old CSV.

Outputs

Regression report -- Printed to standard output. Lists each benchmark that exceeds the regression threshold, showing the old median, new median, absolute difference, and percentage change.
Exit code -- 1 if one or more regressions are detected; 0 if no regressions are found. This enables integration with CI/CD pipelines that gate on exit codes.

Usage Examples

Basic Regression Check

# Run benchmarks on the old and new versions, saving results to CSV
build/old/benchmark/benchmark_runner --out=old_results.csv "benchmark/.*"
build/new/benchmark/benchmark_runner --out=new_results.csv "benchmark/.*"

# Compare the results
python3 scripts/regression_check.py --old old_results.csv --new new_results.csv

Interpreting Output

When regressions are detected, the script outputs a table like:

REGRESSION DETECTED:
  benchmark/micro/join/hash_join_small.benchmark:
    old median: 0.045s
    new median: 0.062s
    difference: +0.017s (+37.8%)

When no regressions are detected:

No regressions detected.

CI/CD Integration

# In a CI pipeline, use the exit code to gate merging
python3 scripts/regression_check.py --old baseline.csv --new candidate.csv
if [ $? -ne 0 ]; then
    echo "Performance regression detected. Blocking merge."
    exit 1
fi

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment