Implementation:Scikit learn Scikit learn Cross Val Predict

Metadata

Domains: Statistics, Model_Evaluation
Source File: sklearn/model_selection/_validation.py
Last Updated: 2026-02-08 15:00 GMT

Overview

Concrete tool for generating out-of-fold predictions using cross-validation provided by scikit-learn. The cross_val_predict function splits data into folds, trains the estimator on each training partition, and collects predictions on each test partition, returning a single array of predictions covering every sample in the dataset.

API Signature

from sklearn.model_selection import cross_val_predict

cross_val_predict(
    estimator,
    X,
    y=None,
    *,
    groups=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    params=None,
    pre_dispatch="2*n_jobs",
    method="predict",
)

Parameters:

estimator (estimator) -- The estimator instance to use to fit the data. It must implement a fit method and the method given by the method parameter.
X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
y (array-like or sparse matrix of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
groups (array-like of shape (n_samples,), default=None) -- Group labels for use with Group CV splitters (e.g., GroupKFold).
cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold. An integer specifies the number of folds in (Stratified)KFold.
n_jobs (int, default=None) -- Number of parallel jobs. -1 uses all processors.
verbose (int, default=0) -- Verbosity level.
params (dict, default=None) -- Parameters to pass to the estimator's fit and the CV splitter. (Added in version 1.4.)
pre_dispatch (int or str, default='2*n_jobs' ) -- Controls the number of pre-dispatched parallel jobs.
method (str, default='predict' ) -- The estimator method to invoke. Must be one of:
- 'predict' -- Standard point predictions.
- 'predict_proba' -- Class probability predictions.
- 'predict_log_proba' -- Log-probability predictions.
- 'decision_function' -- Decision function values (distance to hyperplane).

Returns:

predictions (ndarray) -- Shape depends on the method:
- 'predict' and special case of 'decision_function' with binary target: (n_samples,)
- 'predict_proba', 'predict_log_proba', 'decision_function': (n_samples, n_classes)
- If the estimator is multioutput, an extra dimension n_outputs is appended.

Constraint: cross_val_predict only works for cross-validation splitters that produce partitions (i.e., every sample appears in exactly one test fold). Splitters with overlapping test sets are not supported.

Examples

Basic usage -- Point predictions

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

y_pred = cross_val_predict(lasso, X, y, cv=3)
# y_pred has shape (150,), one prediction per sample

Probability predictions for stacking

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_predict

X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Get out-of-fold probability predictions
proba_pred = cross_val_predict(clf, X, y, cv=5, method='predict_proba')
# proba_pred has shape (150, 3) for 3 classes

Diagnostic visualization

import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

y_pred = cross_val_predict(lasso, X, y, cv=5)

fig, ax = plt.subplots()
ax.scatter(y, y_pred, alpha=0.5)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2)
ax.set_xlabel('Actual')
ax.set_ylabel('Predicted (out-of-fold)')
ax.set_title('Cross-validated predictions')
plt.show()

Important Caveats

Not a valid way to compute global metrics in all cases: Computing a single metric on the full out-of-fold predictions (e.g., accuracy_score(y, cross_val_predict(clf, X, y))) can differ from the mean of per-fold scores from cross_val_score. Results are equivalent only when all test folds have equal size and the metric decomposes over samples.
Absent classes in folds: If a training fold lacks samples from a class, the function handles this by assigning default values: 0 for predict_proba and the minimum finite float value for decision_function and predict_log_proba.
Each sample's prediction comes from a different model: The returned array is a composite of predictions from k different model fits, not a single model.

Related Pages

Principle:Scikit_learn_Scikit_learn_Cross_Validated_Predictions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment