Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn StackingClassifier Init

From Leeroopedia


Template:Metadata

Overview

Concrete tool for creating a stacking ensemble with a meta-learner provided by scikit-learn. The Template:Code stacks the output of individual base estimators and uses a final classifier to compute the final prediction. The base estimators are fitted on the full training data, while the final estimator is trained using cross-validated predictions of the base estimators (generated via Template:Code).

Constructor Signature

from sklearn.ensemble import StackingClassifier

StackingClassifier(
    estimators,
    final_estimator=None,
    *,
    cv=None,
    stack_method="auto",
    n_jobs=None,
    passthrough=False,
    verbose=0,
)

Parameters

  • estimators (list of (str, estimator) tuples) -- Base estimators to be stacked. Each element is a tuple of a name string and an estimator instance. An estimator can be set to Template:Code using Template:Code. The type of estimator is generally expected to be a classifier, though regressors can be passed for use cases such as ordinal regression.
  • final_estimator (estimator, default=None) -- A classifier used to combine the base estimators. The default is Template:Code.
  • cv (int, cross-validation generator, iterable, or "prefit", default=None) -- Determines the cross-validation splitting strategy used in Template:Code to train the final estimator. Possible inputs:
    • None: default 5-fold cross-validation.
    • integer: number of folds in a (Stratified) KFold.
    • An object to be used as a cross-validation generator.
    • An iterable yielding (train, test) splits.
    • Template:Code: assumes the base estimators are already fitted and will not be refitted. The final estimator is trained on the base estimators' predictions on the full training set (risk of overfitting).
  • stack_method ({"auto", "predict_proba", "decision_function", "predict"}, default="auto") -- The method called on each base estimator to generate meta-features. If "auto", tries Template:Code, then Template:Code, then Template:Code in that order.
  • n_jobs (int, default=None) -- Number of jobs to run in parallel for Template:Code of all base estimators. None means 1 unless in a Template:Code context. -1 means using all processors.
  • passthrough (bool, default=False) -- When True, the final estimator is trained on both the base estimators' predictions and the original training data. When False, only the predictions are used as meta-features.
  • verbose (int, default=0) -- Verbosity level.

Fitted Attributes

  • classes_ -- Class labels (ndarray of shape Template:Code or list of ndarray for multilabel).
  • estimators_ -- The elements of the Template:Code parameter, having been fitted on the training data. Estimators set to "drop" are excluded. When Template:Code, these are set to the provided estimators without refitting.
  • named_estimators_ -- A Template:Code object allowing access to fitted sub-estimators by name.
  • n_features_in_ -- Number of features seen during Template:Code (only defined if the underlying estimators expose this attribute).
  • feature_names_in_ -- Names of features seen during Template:Code (only defined if the underlying estimators expose this attribute).
  • final_estimator_ -- The classifier fit on the output of Template:Code, responsible for final predictions.
  • stack_method_ -- The method used by each base estimator to generate meta-features (list of str).

Example Usage

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
estimators = [
    ("rf", RandomForestClassifier(n_estimators=10, random_state=42)),
    ("svr", make_pipeline(StandardScaler(), LinearSVC(random_state=42))),
]
clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=42
)
clf.fit(X_train, y_train).score(X_test, y_test)
# 0.9...

Source Location

Template:Code, class Template:Code (lines 422-839).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment