Implementation:Scikit learn Scikit learn BenchHistGBThreading

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Machine Learning, Benchmarking
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for benchmarking HistGradientBoosting threading performance provided by scikit-learn.

Description

This benchmark script measures the performance of scikit-learn's HistGradientBoostingClassifier and HistGradientBoostingRegressor under varying threading configurations. It supports comparison against LightGBM, XGBoost, and CatBoost implementations. The script uses threadpoolctl to control the number of threads and evaluates both classification and regression tasks on synthetic datasets.

Usage

Use this benchmark to evaluate how histogram-based gradient boosting models scale with different numbers of threads, and to compare scikit-learn's implementation against other gradient boosting libraries.

Code Reference

Source Location

Repository: scikit-learn
File: benchmarks/bench_hist_gradient_boosting_threading.py

Signature

# Command-line benchmark script
parser = argparse.ArgumentParser()
parser.add_argument("--n-leaf-nodes", type=int, default=31)
parser.add_argument("--n-trees", type=int, default=10)
parser.add_argument("--lightgbm", action="store_true", default=False)
parser.add_argument("--xgboost", action="store_true", default=False)
parser.add_argument("--catboost", action="store_true", default=False)
parser.add_argument("--learning-rate", type=float, default=0.1)
parser.add_argument("--problem", type=str, default="classification",
                    choices=["classification", "regression"])
parser.add_argument("--n-samples", type=int, default=int(1e6))
parser.add_argument("--n-features", type=int, default=100)
parser.add_argument("--max-bins", type=int, default=255)

Import

from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor

I/O Contract

Inputs

Name	Type	Required	Description
--n-leaf-nodes	int	No	Maximum number of leaf nodes per tree (default: 31)
--n-trees	int	No	Number of boosting iterations (default: 10)
--problem	str	No	Task type: classification or regression (default: classification)
--n-samples	int	No	Number of samples to generate (default: 1000000)
--n-features	int	No	Number of features in synthetic data (default: 100)
--max-bins	int	No	Maximum number of bins for histogram construction (default: 255)
--learning-rate	float	No	Learning rate for boosting (default: 0.1)
--plot	flag	No	Show a plot of results

Outputs

Name	Type	Description
Console output	text	Fit times and scores for each threading configuration
Plot	matplotlib figure	Optional visualization of threading scaling performance

Usage Examples

Basic Usage

# Run from command line
# python benchmarks/bench_hist_gradient_boosting_threading.py --n-samples 100000 --plot

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=10000, n_features=100, random_state=42)
clf = HistGradientBoostingClassifier(max_leaf_nodes=31, max_iter=10)
clf.fit(X, y)
print(clf.score(X, y))

Related Pages

Principle:Scikit_learn_Scikit_learn_Gradient_Boosting_Classification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment