Principle:Scikit learn Scikit learn Stacking Ensemble

Overview

An ensemble strategy that trains a meta-learner on the cross-validated predictions of multiple base estimators.

Description

Stacking (stacked generalization) is a two-level ensemble architecture. At the first level, a set of diverse base estimators are each trained on the full training data. Their predictions -- generated via cross-validation on the training set -- are collected to form a new set of meta-features. At the second level, a final estimator (the meta-learner) is trained on these meta-features to learn the optimal way to combine the base learners' outputs.

The critical design choice in stacking is the use of cross-validated predictions for constructing the meta-features. If base learners were simply asked to predict on their own training data, the meta-learner would be trained on overly optimistic predictions, leading to severe overfitting. By using k-fold cross-validation, each training sample's meta-feature is generated by a model that did not see that sample during training, producing honest out-of-fold predictions.

The meta-features passed to the final estimator can be class probabilities (via Template:Code), decision function scores (via Template:Code), or raw class predictions (via Template:Code). Optionally, the original input features can be passed through alongside the meta-features, giving the final estimator access to both raw inputs and base learner outputs.

Usage

Stacking ensembles are appropriate when:

You have multiple strong but diverse base learners and want to optimally learn how to weight and combine them.
Simple averaging or majority voting does not capture the complementary patterns in base learner predictions.
You are willing to accept the additional computational cost of cross-validated meta-feature generation.
You want a principled method that adapts the combination strategy to the data rather than using fixed weights.

Theoretical Basis

The theoretical foundations of stacking include:

Stacked Generalization (Wolpert, 1992): The original framework proposes using "level-0" generalizers (base learners) whose outputs feed into a "level-1" generalizer (meta-learner). The key insight is that the level-1 generalizer can learn to correct the biases and exploit the complementary strengths of the level-0 generalizers.
Meta-Learning: The final estimator performs meta-learning -- it learns about the behavior of the base learners rather than learning directly from the raw features. This allows it to discover patterns such as "classifier A is reliable when classifier B is uncertain."
Cross-Validated Meta-Features to Avoid Overfitting: The use of cross-validation to generate out-of-fold predictions for the meta-features is essential. Without this mechanism, the meta-learner would train on predictions that are unrealistically accurate (since each base learner would be predicting on data it was trained on), leading to overfitting in the stacking layer. Cross-validation ensures the meta-features reflect each base learner's true generalization ability.

Related Pages

Implementation:Scikit_learn_Scikit_learn_StackingClassifier_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment