Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn BenchHistGBThreading

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Benchmarking
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for benchmarking HistGradientBoosting threading performance provided by scikit-learn.

Description

This benchmark script measures the performance of scikit-learn's HistGradientBoostingClassifier and HistGradientBoostingRegressor under varying threading configurations. It supports comparison against LightGBM, XGBoost, and CatBoost implementations. The script uses threadpoolctl to control the number of threads and evaluates both classification and regression tasks on synthetic datasets.

Usage

Use this benchmark to evaluate how histogram-based gradient boosting models scale with different numbers of threads, and to compare scikit-learn's implementation against other gradient boosting libraries.

Code Reference

Source Location

Signature

# Command-line benchmark script
parser = argparse.ArgumentParser()
parser.add_argument("--n-leaf-nodes", type=int, default=31)
parser.add_argument("--n-trees", type=int, default=10)
parser.add_argument("--lightgbm", action="store_true", default=False)
parser.add_argument("--xgboost", action="store_true", default=False)
parser.add_argument("--catboost", action="store_true", default=False)
parser.add_argument("--learning-rate", type=float, default=0.1)
parser.add_argument("--problem", type=str, default="classification",
                    choices=["classification", "regression"])
parser.add_argument("--n-samples", type=int, default=int(1e6))
parser.add_argument("--n-features", type=int, default=100)
parser.add_argument("--max-bins", type=int, default=255)

Import

from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor

I/O Contract

Inputs

Name Type Required Description
--n-leaf-nodes int No Maximum number of leaf nodes per tree (default: 31)
--n-trees int No Number of boosting iterations (default: 10)
--problem str No Task type: classification or regression (default: classification)
--n-samples int No Number of samples to generate (default: 1000000)
--n-features int No Number of features in synthetic data (default: 100)
--max-bins int No Maximum number of bins for histogram construction (default: 255)
--learning-rate float No Learning rate for boosting (default: 0.1)
--plot flag No Show a plot of results

Outputs

Name Type Description
Console output text Fit times and scores for each threading configuration
Plot matplotlib figure Optional visualization of threading scaling performance

Usage Examples

Basic Usage

# Run from command line
# python benchmarks/bench_hist_gradient_boosting_threading.py --n-samples 100000 --plot

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=10000, n_features=100, random_state=42)
clf = HistGradientBoostingClassifier(max_leaf_nodes=31, max_iter=10)
clf.fit(X, y)
print(clf.score(X, y))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment