Implementation:Scikit learn Scikit learn BenchHistGBThreading
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Benchmarking |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for benchmarking HistGradientBoosting threading performance provided by scikit-learn.
Description
This benchmark script measures the performance of scikit-learn's HistGradientBoostingClassifier and HistGradientBoostingRegressor under varying threading configurations. It supports comparison against LightGBM, XGBoost, and CatBoost implementations. The script uses threadpoolctl to control the number of threads and evaluates both classification and regression tasks on synthetic datasets.
Usage
Use this benchmark to evaluate how histogram-based gradient boosting models scale with different numbers of threads, and to compare scikit-learn's implementation against other gradient boosting libraries.
Code Reference
Source Location
- Repository: scikit-learn
- File: benchmarks/bench_hist_gradient_boosting_threading.py
Signature
# Command-line benchmark script
parser = argparse.ArgumentParser()
parser.add_argument("--n-leaf-nodes", type=int, default=31)
parser.add_argument("--n-trees", type=int, default=10)
parser.add_argument("--lightgbm", action="store_true", default=False)
parser.add_argument("--xgboost", action="store_true", default=False)
parser.add_argument("--catboost", action="store_true", default=False)
parser.add_argument("--learning-rate", type=float, default=0.1)
parser.add_argument("--problem", type=str, default="classification",
choices=["classification", "regression"])
parser.add_argument("--n-samples", type=int, default=int(1e6))
parser.add_argument("--n-features", type=int, default=100)
parser.add_argument("--max-bins", type=int, default=255)
Import
from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --n-leaf-nodes | int | No | Maximum number of leaf nodes per tree (default: 31) |
| --n-trees | int | No | Number of boosting iterations (default: 10) |
| --problem | str | No | Task type: classification or regression (default: classification) |
| --n-samples | int | No | Number of samples to generate (default: 1000000) |
| --n-features | int | No | Number of features in synthetic data (default: 100) |
| --max-bins | int | No | Maximum number of bins for histogram construction (default: 255) |
| --learning-rate | float | No | Learning rate for boosting (default: 0.1) |
| --plot | flag | No | Show a plot of results |
Outputs
| Name | Type | Description |
|---|---|---|
| Console output | text | Fit times and scores for each threading configuration |
| Plot | matplotlib figure | Optional visualization of threading scaling performance |
Usage Examples
Basic Usage
# Run from command line
# python benchmarks/bench_hist_gradient_boosting_threading.py --n-samples 100000 --plot
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10000, n_features=100, random_state=42)
clf = HistGradientBoostingClassifier(max_leaf_nodes=31, max_iter=10)
clf.fit(X, y)
print(clf.score(X, y))