Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn Scikit learn Cross Validate

From Leeroopedia


Metadata

  • Domains: Statistics, Model_Evaluation
  • Source File: sklearn/model_selection/_validation.py
  • Last Updated: 2026-02-08 15:00 GMT

Overview

Concrete tool for evaluating estimators with cross-validation and multiple metrics provided by scikit-learn. This implementation covers two functions: cross_validate, the full-featured cross-validation evaluator that supports multiple metrics, timing, and estimator return, and cross_val_score, a simplified wrapper for single-metric evaluation.

API Signatures

cross_validate

from sklearn.model_selection import cross_validate

cross_validate(
    estimator,
    X,
    y=None,
    *,
    groups=None,
    scoring=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    params=None,
    pre_dispatch="2*n_jobs",
    return_train_score=False,
    return_estimator=False,
    return_indices=False,
    error_score=np.nan,
)

Parameters:

  • estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
  • X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
  • groups (array-like of shape (n_samples,), default=None) -- Group labels for use with Group CV splitters (e.g., GroupKFold).
  • scoring (str, callable, list, tuple, or dict, default=None) -- Scoring strategy. Accepts:
    • A single string (e.g., 'accuracy', 'neg_mean_squared_error').
    • A callable scorer with signature scorer(estimator, X, y).
    • A list or tuple of strings for multi-metric evaluation.
    • A dictionary mapping metric names to callable scorers.
    • None uses the estimator's default scorer.
  • cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold. An integer specifies the number of folds in (Stratified)KFold.
  • n_jobs (int, default=None) -- Number of parallel jobs. -1 uses all processors.
  • verbose (int, default=0) -- Verbosity level.
  • params (dict, default=None) -- Parameters to pass to the estimator's fit, the scorer, and the CV splitter. (Added in version 1.4.)
  • pre_dispatch (int or str, default='2*n_jobs' ) -- Controls the number of pre-dispatched parallel jobs.
  • return_train_score (bool, default=False) -- Whether to include training scores in the results.
  • return_estimator (bool, default=False) -- Whether to return the fitted estimator for each split.
  • return_indices (bool, default=False) -- Whether to return train/test indices for each split. (Added in version 1.3.)
  • error_score ("raise" or numeric, default=np.nan) -- Value assigned to the score if an error occurs in estimator fitting.

Returns:

  • scores (dict of float arrays of shape (n_splits,)) -- Dictionary containing:
    • test_score -- Test scores for each fold (or test_<metric> for multi-metric).
    • train_score -- Train scores (only if return_train_score=True).
    • fit_time -- Time in seconds for fitting each fold.
    • score_time -- Time in seconds for scoring each fold.
    • estimator -- Fitted estimators (only if return_estimator=True).
    • indices -- Train/test index arrays (only if return_indices=True).

Example -- Single metric:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

cv_results = cross_validate(lasso, X, y, cv=3)
print(sorted(cv_results.keys()))
# ['fit_time', 'score_time', 'test_score']
print(cv_results['test_score'])
# array([0.3315057 , 0.08022103, 0.03531816])

Example -- Multiple metrics:

scores = cross_validate(
    lasso, X, y, cv=3,
    scoring=('r2', 'neg_mean_squared_error'),
    return_train_score=True
)
print(scores['test_neg_mean_squared_error'])
# [-3635.5 -3573.3 -6114.7]
print(scores['train_r2'])
# [0.28009951 0.3908844  0.22784907]

cross_val_score

from sklearn.model_selection import cross_val_score

cross_val_score(
    estimator,
    X,
    y=None,
    *,
    groups=None,
    scoring=None,
    cv=None,
    n_jobs=None,
    verbose=0,
    params=None,
    pre_dispatch="2*n_jobs",
    error_score=np.nan,
)

Parameters:

  • estimator (estimator object implementing 'fit' ) -- The object to use to fit the data.
  • X (array-like or sparse matrix of shape (n_samples, n_features)) -- The data to fit.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) -- Target variable for supervised learning.
  • groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
  • scoring (str or callable, default=None) -- A single scoring strategy (string or callable). Unlike cross_validate, multi-metric scoring is not supported.
  • cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold.
  • n_jobs (int, default=None) -- Number of parallel jobs.
  • verbose (int, default=0) -- Verbosity level.
  • params (dict, default=None) -- Parameters to pass to the estimator's fit, the scorer, and the CV splitter. (Added in version 1.4.)
  • pre_dispatch (int or str, default='2*n_jobs' ) -- Controls pre-dispatched parallel jobs.
  • error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs.

Returns:

  • scores (ndarray of float of shape (len(list(cv)),)) -- Array of scores for each cross-validation fold.

Example:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()

print(cross_val_score(lasso, X, y, cv=3))
# [0.3315057  0.08022103 0.03531816]

Choosing Between cross_validate and cross_val_score

Feature cross_validate cross_val_score
Multi-metric scoring Yes (list, tuple, or dict) No (single metric only)
Returns fit/score times Yes (always) No
Returns fitted estimators Yes (optional) No
Returns train scores Yes (optional) No
Returns train/test indices Yes (optional) No
Return type dict of arrays ndarray

Use cross_val_score for quick, single-metric evaluation. Use cross_validate when you need multiple metrics, timing diagnostics, or access to fitted estimators.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment