Implementation:Scikit learn Scikit learn Cross Validate
Metadata
- Domains: Statistics, Model_Evaluation
- Source File:
sklearn/model_selection/_validation.py - Last Updated: 2026-02-08 15:00 GMT
Overview
Concrete tool for evaluating estimators with cross-validation and multiple metrics provided by scikit-learn. This implementation covers two functions: cross_validate, the full-featured cross-validation evaluator that supports multiple metrics, timing, and estimator return, and cross_val_score, a simplified wrapper for single-metric evaluation.
API Signatures
cross_validate
from sklearn.model_selection import cross_validate
cross_validate(
estimator,
X,
y=None,
*,
groups=None,
scoring=None,
cv=None,
n_jobs=None,
verbose=0,
params=None,
pre_dispatch="2*n_jobs",
return_train_score=False,
return_estimator=False,
return_indices=False,
error_score=np.nan,
)
Parameters:
- estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
- X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
- groups (array-like of shape (n_samples,), default=None) -- Group labels for use with Group CV splitters (e.g.,
GroupKFold). - scoring (str, callable, list, tuple, or dict, default=None) -- Scoring strategy. Accepts:
- A single string (e.g.,
'accuracy','neg_mean_squared_error'). - A callable scorer with signature
scorer(estimator, X, y). - A list or tuple of strings for multi-metric evaluation.
- A dictionary mapping metric names to callable scorers.
Noneuses the estimator's default scorer.
- A single string (e.g.,
- cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy.
Nonedefaults to 5-fold. An integer specifies the number of folds in(Stratified)KFold. - n_jobs (int, default=None) -- Number of parallel jobs.
-1uses all processors. - verbose (int, default=0) -- Verbosity level.
- params (dict, default=None) -- Parameters to pass to the estimator's
fit, the scorer, and the CV splitter. (Added in version 1.4.) - pre_dispatch (int or str, default='2*n_jobs' ) -- Controls the number of pre-dispatched parallel jobs.
- return_train_score (bool, default=False) -- Whether to include training scores in the results.
- return_estimator (bool, default=False) -- Whether to return the fitted estimator for each split.
- return_indices (bool, default=False) -- Whether to return train/test indices for each split. (Added in version 1.3.)
- error_score ("raise" or numeric, default=np.nan) -- Value assigned to the score if an error occurs in estimator fitting.
Returns:
- scores (dict of float arrays of shape (n_splits,)) -- Dictionary containing:
test_score-- Test scores for each fold (ortest_<metric>for multi-metric).train_score-- Train scores (only ifreturn_train_score=True).fit_time-- Time in seconds for fitting each fold.score_time-- Time in seconds for scoring each fold.estimator-- Fitted estimators (only ifreturn_estimator=True).indices-- Train/test index arrays (only ifreturn_indices=True).
Example -- Single metric:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3)
print(sorted(cv_results.keys()))
# ['fit_time', 'score_time', 'test_score']
print(cv_results['test_score'])
# array([0.3315057 , 0.08022103, 0.03531816])
Example -- Multiple metrics:
scores = cross_validate(
lasso, X, y, cv=3,
scoring=('r2', 'neg_mean_squared_error'),
return_train_score=True
)
print(scores['test_neg_mean_squared_error'])
# [-3635.5 -3573.3 -6114.7]
print(scores['train_r2'])
# [0.28009951 0.3908844 0.22784907]
cross_val_score
from sklearn.model_selection import cross_val_score
cross_val_score(
estimator,
X,
y=None,
*,
groups=None,
scoring=None,
cv=None,
n_jobs=None,
verbose=0,
params=None,
pre_dispatch="2*n_jobs",
error_score=np.nan,
)
Parameters:
- estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
- X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
- groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
- scoring (str or callable, default=None) -- A single scoring strategy (string or callable). Unlike
cross_validate, multi-metric scoring is not supported. - cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy.
Nonedefaults to 5-fold. - n_jobs (int, default=None) -- Number of parallel jobs.
- verbose (int, default=0) -- Verbosity level.
- params (dict, default=None) -- Parameters to pass to the estimator's
fit, the scorer, and the CV splitter. (Added in version 1.4.) - pre_dispatch (int or str, default='2*n_jobs' ) -- Controls pre-dispatched parallel jobs.
- error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs.
Returns:
- scores (ndarray of float of shape (len(list(cv)),)) -- Array of scores for each cross-validation fold.
Example:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
print(cross_val_score(lasso, X, y, cv=3))
# [0.3315057 0.08022103 0.03531816]
Choosing Between cross_validate and cross_val_score
| Feature | cross_validate |
cross_val_score
|
|---|---|---|
| Multi-metric scoring | Yes (list, tuple, or dict) | No (single metric only) |
| Returns fit/score times | Yes (always) | No |
| Returns fitted estimators | Yes (optional) | No |
| Returns train scores | Yes (optional) | No |
| Returns train/test indices | Yes (optional) | No |
| Return type | dict of arrays |
ndarray
|
Use cross_val_score for quick, single-metric evaluation. Use cross_validate when you need multiple metrics, timing diagnostics, or access to fitted estimators.
Related Pages
- Principle:Scikit_learn_Scikit_learn_Cross_Validation
- Environment:Scikit_learn_Scikit_learn_Python_Runtime_Environment
- Environment:Scikit_learn_Scikit_learn_OpenMP_Thread_Configuration
- Heuristic:Scikit_learn_Scikit_learn_Data_Leakage_Prevention
- Heuristic:Scikit_learn_Scikit_learn_N_Jobs_Parallelism_Tips
- Heuristic:Scikit_learn_Scikit_learn_Random_State_Management
- Heuristic:Scikit_learn_Scikit_learn_Working_Memory_Tuning
- Heuristic:Scikit_learn_Scikit_learn_Convergence_Warning_Handling