Implementation:Scikit learn Scikit learn Cross Val Predict
Metadata
- Domains: Statistics, Model_Evaluation
- Source File:
sklearn/model_selection/_validation.py - Last Updated: 2026-02-08 15:00 GMT
Overview
Concrete tool for generating out-of-fold predictions using cross-validation provided by scikit-learn. The cross_val_predict function splits data into folds, trains the estimator on each training partition, and collects predictions on each test partition, returning a single array of predictions covering every sample in the dataset.
API Signature
from sklearn.model_selection import cross_val_predict
cross_val_predict(
estimator,
X,
y=None,
*,
groups=None,
cv=None,
n_jobs=None,
verbose=0,
params=None,
pre_dispatch="2*n_jobs",
method="predict",
)
Parameters:
- estimator (estimator) -- The estimator instance to use to fit the data. It must implement a
fitmethod and the method given by themethodparameter. - X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
- y (array-like or sparse matrix of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
- groups (array-like of shape (n_samples,), default=None) -- Group labels for use with Group CV splitters (e.g.,
GroupKFold). - cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy.
Nonedefaults to 5-fold. An integer specifies the number of folds in(Stratified)KFold. - n_jobs (int, default=None) -- Number of parallel jobs.
-1uses all processors. - verbose (int, default=0) -- Verbosity level.
- params (dict, default=None) -- Parameters to pass to the estimator's
fitand the CV splitter. (Added in version 1.4.) - pre_dispatch (int or str, default='2*n_jobs' ) -- Controls the number of pre-dispatched parallel jobs.
- method (str, default='predict' ) -- The estimator method to invoke. Must be one of:
'predict'-- Standard point predictions.'predict_proba'-- Class probability predictions.'predict_log_proba'-- Log-probability predictions.'decision_function'-- Decision function values (distance to hyperplane).
Returns:
- predictions (ndarray) -- Shape depends on the method:
'predict'and special case of'decision_function'with binary target:(n_samples,)'predict_proba','predict_log_proba','decision_function':(n_samples, n_classes)- If the estimator is multioutput, an extra dimension
n_outputsis appended.
Constraint: cross_val_predict only works for cross-validation splitters that produce partitions (i.e., every sample appears in exactly one test fold). Splitters with overlapping test sets are not supported.
Examples
Basic usage -- Point predictions
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
y_pred = cross_val_predict(lasso, X, y, cv=3)
# y_pred has shape (150,), one prediction per sample
Probability predictions for stacking
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_predict
X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# Get out-of-fold probability predictions
proba_pred = cross_val_predict(clf, X, y, cv=5, method='predict_proba')
# proba_pred has shape (150, 3) for 3 classes
Diagnostic visualization
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
y_pred = cross_val_predict(lasso, X, y, cv=5)
fig, ax = plt.subplots()
ax.scatter(y, y_pred, alpha=0.5)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2)
ax.set_xlabel('Actual')
ax.set_ylabel('Predicted (out-of-fold)')
ax.set_title('Cross-validated predictions')
plt.show()
Important Caveats
- Not a valid way to compute global metrics in all cases: Computing a single metric on the full out-of-fold predictions (e.g.,
accuracy_score(y, cross_val_predict(clf, X, y))) can differ from the mean of per-fold scores fromcross_val_score. Results are equivalent only when all test folds have equal size and the metric decomposes over samples. - Absent classes in folds: If a training fold lacks samples from a class, the function handles this by assigning default values: 0 for
predict_probaand the minimum finite float value fordecision_functionandpredict_log_proba. - Each sample's prediction comes from a different model: The returned array is a composite of predictions from k different model fits, not a single model.