Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn Scikit learn BaseSearchCV Fit

From Leeroopedia


Template:Implementation Metadata

Overview

Concrete tool for executing hyperparameter search by fitting across all candidates and CV folds provided by scikit-learn.

The BaseSearchCV.fit method is the central execution engine for all search-based hyperparameter tuners in scikit-learn. It orchestrates the parallel clone-fit-score loop, aggregates results, selects the best configuration, and optionally refits a final estimator on the full dataset.

Code Reference

Method Signature

def fit(self, X, y=None, **params):
    """Run fit with all sets of parameters.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features) or (n_samples, n_samples)
        Training vectors, where n_samples is the number of samples and
        n_features is the number of features. For precomputed kernel or
        distance matrix, the expected shape of X is (n_samples, n_samples).

    y : array-like of shape (n_samples, n_output)
        or (n_samples,), default=None
        Target relative to X for classification or regression;
        None for unsupervised learning.

    **params : dict of str -> object
        Parameters passed to the fit method of the estimator, the scorer,
        and the CV splitter.

    Returns
    -------
    self : object
        Instance of fitted estimator.
    """

I/O Contract

Input:

  • X -- training feature matrix, array-like of shape (n_samples, n_features).
  • y -- target values, array-like of shape (n_samples,) or (n_samples, n_output). May be None for unsupervised estimators.
  • **params -- additional parameters routed to the estimator's fit, the scorer's score, or the CV splitter's split (e.g., sample_weight, groups).

Output:

  • Returns self (the fitted search object), with the following attributes populated:
Attribute Type Description
cv_results_ dict of numpy arrays Comprehensive results dictionary containing per-split scores, mean/std aggregations, rankings, fit/score times, and parameter values for every candidate.
best_params_ dict The parameter configuration that achieved the highest mean test score (or best rank).
best_score_ float The mean cross-validated score of the best candidate. Not available when refit is a callable.
best_index_ int The index into cv_results_ arrays corresponding to the best candidate.
best_estimator_ estimator A clone of the base estimator, fitted on the full dataset with best_params_. Only available when refit=True.
n_splits_ int The number of cross-validation splits used.
refit_time_ float Seconds spent refitting on the full dataset. Only available when refit is not False.
multimetric_ bool Whether multiple scoring metrics were used.
scorer_ function or dict The scorer(s) used. A dict for multi-metric evaluation.

Execution Flow

The fit method proceeds through these stages:

1. Setup:

estimator = self.estimator
scorers, refit_metric = self._get_scorers()
X, y = indexable(X, y)
params = _check_method_params(X, params=params)
routed_params = self._get_routed_params_for_fit(params)
cv_orig = check_cv(self.cv, y, classifier=is_classifier(estimator))
n_splits = cv_orig.get_n_splits(X, y, **routed_params.splitter.split)
base_estimator = clone(self.estimator)

2. Parallel evaluation via evaluate_candidates callback:

parallel = Parallel(n_jobs=self.n_jobs, pre_dispatch=self.pre_dispatch)

def evaluate_candidates(candidate_params, cv=None, more_results=None):
    cv = cv or cv_orig
    candidate_params = list(candidate_params)
    n_candidates = len(candidate_params)

    out = parallel(
        delayed(_fit_and_score)(
            clone(base_estimator), X, y,
            train=train, test=test, parameters=parameters,
            split_progress=(split_idx, n_splits),
            candidate_progress=(cand_idx, n_candidates),
            **fit_and_score_kwargs,
        )
        for (cand_idx, parameters), (split_idx, (train, test)) in product(
            enumerate(candidate_params),
            enumerate(cv.split(X, y, **routed_params.splitter.split)),
        )
    )
    ...
    results = self._format_results(
        all_candidate_params, n_splits, all_out, all_more_results
    )
    return results

3. Best selection and refit:

# Select best
self.best_index_ = self._select_best_index(self.refit, refit_metric, results)
self.best_score_ = results[f"mean_test_{refit_metric}"][self.best_index_]
self.best_params_ = results["params"][self.best_index_]

# Refit on full data
self.best_estimator_ = clone(base_estimator).set_params(
    **clone(self.best_params_, safe=False)
)
self.best_estimator_.fit(X, y, **routed_params.estimator.fit)

4. Store final results:

self.cv_results_ = results
self.n_splits_ = n_splits
return self

Usage Examples

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

search = GridSearchCV(
    SVC(),
    param_grid={'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']},
    cv=5,
    n_jobs=-1,
    scoring='accuracy',
    return_train_score=True,
)
search.fit(X, y)

# After fit, all result attributes are available
print(search.best_params_)       # {'C': 1, 'kernel': 'linear'}
print(search.best_score_)        # 0.98
print(search.best_estimator_)    # SVC(C=1, kernel='linear')
print(search.n_splits_)          # 5
print(search.refit_time_)        # 0.002 (seconds)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment