Implementation:Scikit learn Scikit learn GradientBoostingClassifier Init
Appearance
Overview
Concrete tool for creating a gradient boosting classifier ensemble provided by scikit-learn. The Template:Code builds an additive model in a forward stage-wise fashion, optimizing arbitrary differentiable loss functions. In each stage, Template:Code regression trees are fit on the negative gradient of the loss function. Binary classification is a special case where only a single regression tree is induced per stage.
Constructor Signature
from sklearn.ensemble import GradientBoostingClassifier
GradientBoostingClassifier(
*,
loss="log_loss",
learning_rate=0.1,
n_estimators=100,
subsample=1.0,
criterion="deprecated",
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_depth=3,
min_impurity_decrease=0.0,
init=None,
random_state=None,
max_features=None,
verbose=0,
max_leaf_nodes=None,
warm_start=False,
validation_fraction=0.1,
n_iter_no_change=None,
tol=1e-4,
ccp_alpha=0.0,
)
Parameters
- loss ({"log_loss", "exponential"}, default="log_loss") -- The loss function to be optimized. "log_loss" refers to binomial and multinomial deviance. "exponential" recovers the AdaBoost algorithm (binary classification only).
- learning_rate (float, default=0.1) -- Shrinks the contribution of each tree by this factor. There is a trade-off between Template:Code and Template:Code.
- n_estimators (int, default=100) -- The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting, so a large number usually results in better performance.
- subsample (float, default=1.0) -- The fraction of samples used for fitting individual base learners. Values below 1.0 result in Stochastic Gradient Boosting, which reduces variance at the cost of increased bias.
- criterion ({"friedman_mse", "squared_error"}, default="friedman_mse") -- Deprecated since version 1.9; will be removed in 1.11.
- min_samples_split (int or float, default=2) -- Minimum number of samples required to split an internal node.
- min_samples_leaf (int or float, default=1) -- Minimum number of samples required at a leaf node.
- min_weight_fraction_leaf (float, default=0.0) -- Minimum weighted fraction of the sum total of weights required at a leaf node.
- max_depth (int or None, default=3) -- Maximum depth of the individual regression estimators. Controls the interaction order of the model.
- min_impurity_decrease (float, default=0.0) -- A node is split only if the impurity decrease is at least this value.
- init (estimator or "zero", default=None) -- An estimator object used to compute initial predictions. Must provide Template:Code and Template:Code. If "zero", initial raw predictions are set to zero. By default, a Template:Code predicting class priors is used.
- random_state (int, RandomState instance or None, default=None) -- Controls the random seed given to each tree at each boosting iteration.
- max_features ({"sqrt", "log2"}, int or float, default=None) -- Number of features to consider when looking for the best split. If None, all features are used.
- verbose (int, default=0) -- Enable verbose output. If 1, prints progress periodically. If greater than 1, prints for every tree.
- max_leaf_nodes (int, default=None) -- Grow trees with at most this many leaf nodes in best-first fashion.
- warm_start (bool, default=False) -- When True, reuse the previous solution and add more estimators to the ensemble.
- validation_fraction (float, default=0.1) -- Proportion of training data set aside as validation for early stopping. Only used if Template:Code is set.
- n_iter_no_change (int, default=None) -- Number of iterations with no improvement to trigger early stopping. If None, early stopping is disabled.
- tol (float, default=1e-4) -- Tolerance for early stopping. Training stops when the loss is not improving by at least Template:Code for Template:Code iterations.
- ccp_alpha (non-negative float, default=0.0) -- Complexity parameter for Minimal Cost-Complexity Pruning.
Fitted Attributes
- n_estimators_ -- The number of estimators as selected by early stopping (or set to Template:Code).
- n_trees_per_iteration_ -- Number of trees built at each iteration (1 for binary, Template:Code for multiclass).
- feature_importances_ -- Impurity-based feature importances (ndarray of shape Template:Code).
- oob_improvement_ -- Improvement in loss on the out-of-bag samples relative to the previous iteration (only if Template:Code).
- oob_scores_ -- Full history of loss values on the out-of-bag samples (only if Template:Code).
- oob_score_ -- The last value of the loss on out-of-bag samples (only if Template:Code).
- train_score_ -- The loss of the model at each iteration on the in-bag sample.
- init_ -- The estimator providing initial predictions.
- estimators_ -- The collection of fitted sub-estimators (ndarray of Template:Code of shape Template:Code).
- classes_ -- The class labels.
- n_classes_ -- The number of classes.
- n_features_in_ -- Number of features seen during Template:Code.
- feature_names_in_ -- Names of features seen during Template:Code (only when X has string feature names).
- max_features_ -- The inferred value of max_features.
Example Usage
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
clf = GradientBoostingClassifier(
n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0
)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
# 0.913
Source Location
Template:Code, class Template:Code (lines 1145-1754).
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment