Implementation:Scikit learn Scikit learn Cross Validate

Metadata

Domains: Statistics, Model_Evaluation
Source File: sklearn/model_selection/_validation.py
Last Updated: 2026-02-08 15:00 GMT

Overview

Concrete tool for evaluating estimators with cross-validation and multiple metrics provided by scikit-learn. This implementation covers two functions: cross_validate, the full-featured cross-validation evaluator that supports multiple metrics, timing, and estimator return, and cross_val_score, a simplified wrapper for single-metric evaluation.

API Signatures

cross_validate

from sklearn.model_selection import cross_validate

cross_validate(
    estimator,
    X,
    y=None,
    *,
    groups=None,
    scoring=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    params=None,
    pre_dispatch="2*n_jobs",
    return_train_score=False,
    return_estimator=False,
    return_indices=False,
    error_score=np.nan,
)

Parameters:

estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
groups (array-like of shape (n_samples,), default=None) -- Group labels for use with Group CV splitters (e.g., GroupKFold).
scoring (str, callable, list, tuple, or dict, default=None) -- Scoring strategy. Accepts:
- A single string (e.g., 'accuracy', 'neg_mean_squared_error').
- A callable scorer with signature scorer(estimator, X, y).
- A list or tuple of strings for multi-metric evaluation.
- A dictionary mapping metric names to callable scorers.
- None uses the estimator's default scorer.
cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold. An integer specifies the number of folds in (Stratified)KFold.
n_jobs (int, default=None) -- Number of parallel jobs. -1 uses all processors.
verbose (int, default=0) -- Verbosity level.
params (dict, default=None) -- Parameters to pass to the estimator's fit, the scorer, and the CV splitter. (Added in version 1.4.)
pre_dispatch (int or str, default='2*n_jobs' ) -- Controls the number of pre-dispatched parallel jobs.
return_train_score (bool, default=False) -- Whether to include training scores in the results.
return_estimator (bool, default=False) -- Whether to return the fitted estimator for each split.
return_indices (bool, default=False) -- Whether to return train/test indices for each split. (Added in version 1.3.)
error_score ("raise" or numeric, default=np.nan) -- Value assigned to the score if an error occurs in estimator fitting.

Returns:

scores (dict of float arrays of shape (n_splits,)) -- Dictionary containing:
- test_score -- Test scores for each fold (or test_<metric> for multi-metric).
- train_score -- Train scores (only if return_train_score=True).
- fit_time -- Time in seconds for fitting each fold.
- score_time -- Time in seconds for scoring each fold.
- estimator -- Fitted estimators (only if return_estimator=True).
- indices -- Train/test index arrays (only if return_indices=True).

Example -- Single metric:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

cv_results = cross_validate(lasso, X, y, cv=3)
print(sorted(cv_results.keys()))
# ['fit_time', 'score_time', 'test_score']
print(cv_results['test_score'])
# array([0.3315057 , 0.08022103, 0.03531816])

Example -- Multiple metrics:

scores = cross_validate(
    lasso, X, y, cv=3,
    scoring=('r2', 'neg_mean_squared_error'),
    return_train_score=True
)
print(scores['test_neg_mean_squared_error'])
# [-3635.5 -3573.3 -6114.7]
print(scores['train_r2'])
# [0.28009951 0.3908844  0.22784907]

cross_val_score

from sklearn.model_selection import cross_val_score

cross_val_score(
    estimator,
    X,
    y=None,
    *,
    groups=None,
    scoring=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    params=None,
    pre_dispatch="2*n_jobs",
    error_score=np.nan,
)

Parameters:

estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
scoring (str or callable, default=None) -- A single scoring strategy (string or callable). Unlike cross_validate, multi-metric scoring is not supported.
cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold.
n_jobs (int, default=None) -- Number of parallel jobs.
verbose (int, default=0) -- Verbosity level.
params (dict, default=None) -- Parameters to pass to the estimator's fit, the scorer, and the CV splitter. (Added in version 1.4.)
pre_dispatch (int or str, default='2*n_jobs' ) -- Controls pre-dispatched parallel jobs.
error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs.

Returns:

scores (ndarray of float of shape (len(list(cv)),)) -- Array of scores for each cross-validation fold.

Example:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

print(cross_val_score(lasso, X, y, cv=3))
# [0.3315057  0.08022103 0.03531816]

Choosing Between cross_validate and cross_val_score

Feature	`cross_validate`	`cross_val_score`
Multi-metric scoring	Yes (list, tuple, or dict)	No (single metric only)
Returns fit/score times	Yes (always)	No
Returns fitted estimators	Yes (optional)	No
Returns train scores	Yes (optional)	No
Returns train/test indices	Yes (optional)	No
Return type	`dict` of arrays	`ndarray`

Use cross_val_score for quick, single-metric evaluation. Use cross_validate when you need multiple metrics, timing diagnostics, or access to fitted estimators.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment