Implementation:Scikit learn Scikit learn StackingClassifier Init

Overview

Concrete tool for creating a stacking ensemble with a meta-learner provided by scikit-learn. The Template:Code stacks the output of individual base estimators and uses a final classifier to compute the final prediction. The base estimators are fitted on the full training data, while the final estimator is trained using cross-validated predictions of the base estimators (generated via Template:Code).

Constructor Signature

from sklearn.ensemble import StackingClassifier

StackingClassifier(
    estimators,
    final_estimator=None,
    *,
    cv=None,
    stack_method="auto",
    n_jobs=None,
    passthrough=False,
    verbose=0,
)

Parameters

estimators (list of (str, estimator) tuples) -- Base estimators to be stacked. Each element is a tuple of a name string and an estimator instance. An estimator can be set to Template:Code using Template:Code. The type of estimator is generally expected to be a classifier, though regressors can be passed for use cases such as ordinal regression.
final_estimator (estimator, default=None) -- A classifier used to combine the base estimators. The default is Template:Code.
cv (int, cross-validation generator, iterable, or "prefit", default=None) -- Determines the cross-validation splitting strategy used in Template:Code to train the final estimator. Possible inputs:
- None: default 5-fold cross-validation.
- integer: number of folds in a (Stratified) KFold.
- An object to be used as a cross-validation generator.
- An iterable yielding (train, test) splits.
- Template:Code: assumes the base estimators are already fitted and will not be refitted. The final estimator is trained on the base estimators' predictions on the full training set (risk of overfitting).
stack_method ({"auto", "predict_proba", "decision_function", "predict"}, default="auto") -- The method called on each base estimator to generate meta-features. If "auto", tries Template:Code, then Template:Code, then Template:Code in that order.
n_jobs (int, default=None) -- Number of jobs to run in parallel for Template:Code of all base estimators. None means 1 unless in a Template:Code context. -1 means using all processors.
passthrough (bool, default=False) -- When True, the final estimator is trained on both the base estimators' predictions and the original training data. When False, only the predictions are used as meta-features.
verbose (int, default=0) -- Verbosity level.

Fitted Attributes

classes_ -- Class labels (ndarray of shape Template:Code or list of ndarray for multilabel).
estimators_ -- The elements of the Template:Code parameter, having been fitted on the training data. Estimators set to "drop" are excluded. When Template:Code, these are set to the provided estimators without refitting.
named_estimators_ -- A Template:Code object allowing access to fitted sub-estimators by name.
n_features_in_ -- Number of features seen during Template:Code (only defined if the underlying estimators expose this attribute).
feature_names_in_ -- Names of features seen during Template:Code (only defined if the underlying estimators expose this attribute).
final_estimator_ -- The classifier fit on the output of Template:Code, responsible for final predictions.
stack_method_ -- The method used by each base estimator to generate meta-features (list of str).

Example Usage

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
estimators = [
    ("rf", RandomForestClassifier(n_estimators=10, random_state=42)),
    ("svr", make_pipeline(StandardScaler(), LinearSVC(random_state=42))),
]
clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=42
)
clf.fit(X_train, y_train).score(X_test, y_test)
# 0.9...

Source Location

Template:Code, class Template:Code (lines 422-839).

Related Pages

Principle:Scikit_learn_Scikit_learn_Stacking_Ensemble

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment