Principle:Scikit learn Scikit learn Cross Validated Predictions

Metadata

Domains: Statistics, Model_Evaluation
Sources: scikit-learn documentation, "The Elements of Statistical Learning" Hastie et al.
Last Updated: 2026-02-08 15:00 GMT

Overview

A prediction strategy that generates out-of-fold predictions for every sample by using each fold's held-out test partition.

While cross_val_score and cross_validate return aggregate scores per fold, cross-validated predictions return predictions for every sample in the dataset. Each sample's prediction is generated by a model that was not trained on that sample, providing out-of-fold (or out-of-sample) predictions that avoid the optimistic bias inherent in predicting on training data.

Description

How cross_val_predict works differently from cross_val_score:

cross_val_score fits the model on each training fold, scores it on the corresponding test fold, and returns an array of k scalar scores.
cross_val_predict fits the model on each training fold, generates predictions on the corresponding test fold, and then concatenates all test fold predictions into a single array covering every sample in the dataset. The result is that each sample has exactly one prediction, generated by a model that never saw that sample during training.

The key distinction is that cross_val_predict returns a prediction vector of the same length as the dataset, not an array of scores. This enables downstream analyses that require per-sample predictions.

Out-of-fold predictions:

The term "out-of-fold" emphasizes that each prediction is made by a model trained on all data except the fold containing that sample. This property makes the predictions:

Unbiased at the sample level: No sample's prediction is contaminated by having been seen during training.
Suitable for computing sample-level diagnostics: Residual plots, confusion matrices, calibration curves, and ROC curves can be constructed from these predictions.

Important caveat: Passing these predictions into a global evaluation metric (e.g., computing accuracy_score(y_true, y_pred) on the full out-of-fold predictions) may not produce the same result as the mean of per-fold scores from cross_val_score. This is because the global metric combines predictions from k different models, and the metric may not decompose additively over samples or folds.

Use cases for cross-validated predictions:

Stacking (blending): Out-of-fold predictions from a base model serve as features for a meta-learner. This is the standard approach for constructing stacked ensembles without data leakage.
Probability calibration: Out-of-fold predicted probabilities can be used to fit a calibration model (e.g., Platt scaling, isotonic regression) without optimistic bias.
Visualization and diagnostics: Plotting predicted vs. actual values, residual distributions, or confusion matrices using out-of-fold predictions gives a more honest picture of model behavior than using in-sample predictions.
Error analysis: Identifying which samples are consistently mispredicted across the cross-validation procedure.

Usage

Cross-validated predictions should be used when:

You need per-sample predictions for diagnostic analysis (residual plots, confusion matrices).
You are building a stacked ensemble and need unbiased first-level predictions as meta-features.
You want to calibrate probabilities without introducing optimistic bias from in-sample predictions.
You want to visualize predicted vs. actual values for the full dataset.

Theoretical Basis

Out-of-fold predictions approximate the predictions a model would make on genuinely unseen data. For each sample i in fold f, the prediction is generated by a model f_hat_{-f} trained on all samples not in fold f. This is analogous to the leave-one-out prediction concept, generalized to k folds.

However, unlike cross-validated scores which estimate a population-level quantity (expected loss), cross-validated predictions do not have a single clean statistical interpretation as an estimator. Different samples are predicted by different models (each trained on a different (k-1)/k subset), so the combined prediction vector is a composite of k distinct models rather than predictions from a single model.

Related Pages

Implementation:Scikit_learn_Scikit_learn_Cross_Val_Predict

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment