Implementation:Scikit learn Scikit learn BaseSearchCV Fit
Template:Implementation Metadata
Overview
Concrete tool for executing hyperparameter search by fitting across all candidates and CV folds provided by scikit-learn.
The BaseSearchCV.fit method is the central execution engine for all search-based hyperparameter tuners in scikit-learn. It orchestrates the parallel clone-fit-score loop, aggregates results, selects the best configuration, and optionally refits a final estimator on the full dataset.
Code Reference
Method Signature
def fit(self, X, y=None, **params):
"""Run fit with all sets of parameters.
Parameters
----------
X : array-like of shape (n_samples, n_features) or (n_samples, n_samples)
Training vectors, where n_samples is the number of samples and
n_features is the number of features. For precomputed kernel or
distance matrix, the expected shape of X is (n_samples, n_samples).
y : array-like of shape (n_samples, n_output)
or (n_samples,), default=None
Target relative to X for classification or regression;
None for unsupervised learning.
**params : dict of str -> object
Parameters passed to the fit method of the estimator, the scorer,
and the CV splitter.
Returns
-------
self : object
Instance of fitted estimator.
"""
I/O Contract
Input:
X-- training feature matrix, array-like of shape(n_samples, n_features).y-- target values, array-like of shape(n_samples,)or(n_samples, n_output). May beNonefor unsupervised estimators.**params-- additional parameters routed to the estimator'sfit, the scorer'sscore, or the CV splitter'ssplit(e.g.,sample_weight,groups).
Output:
- Returns
self(the fitted search object), with the following attributes populated:
| Attribute | Type | Description |
|---|---|---|
cv_results_ |
dict of numpy arrays |
Comprehensive results dictionary containing per-split scores, mean/std aggregations, rankings, fit/score times, and parameter values for every candidate. |
best_params_ |
dict |
The parameter configuration that achieved the highest mean test score (or best rank). |
best_score_ |
float |
The mean cross-validated score of the best candidate. Not available when refit is a callable.
|
best_index_ |
int |
The index into cv_results_ arrays corresponding to the best candidate.
|
best_estimator_ |
estimator | A clone of the base estimator, fitted on the full dataset with best_params_. Only available when refit=True.
|
n_splits_ |
int |
The number of cross-validation splits used. |
refit_time_ |
float |
Seconds spent refitting on the full dataset. Only available when refit is not False.
|
multimetric_ |
bool |
Whether multiple scoring metrics were used. |
scorer_ |
function or dict |
The scorer(s) used. A dict for multi-metric evaluation. |
Execution Flow
The fit method proceeds through these stages:
1. Setup:
estimator = self.estimator
scorers, refit_metric = self._get_scorers()
X, y = indexable(X, y)
params = _check_method_params(X, params=params)
routed_params = self._get_routed_params_for_fit(params)
cv_orig = check_cv(self.cv, y, classifier=is_classifier(estimator))
n_splits = cv_orig.get_n_splits(X, y, **routed_params.splitter.split)
base_estimator = clone(self.estimator)
2. Parallel evaluation via evaluate_candidates callback:
parallel = Parallel(n_jobs=self.n_jobs, pre_dispatch=self.pre_dispatch)
def evaluate_candidates(candidate_params, cv=None, more_results=None):
cv = cv or cv_orig
candidate_params = list(candidate_params)
n_candidates = len(candidate_params)
out = parallel(
delayed(_fit_and_score)(
clone(base_estimator), X, y,
train=train, test=test, parameters=parameters,
split_progress=(split_idx, n_splits),
candidate_progress=(cand_idx, n_candidates),
**fit_and_score_kwargs,
)
for (cand_idx, parameters), (split_idx, (train, test)) in product(
enumerate(candidate_params),
enumerate(cv.split(X, y, **routed_params.splitter.split)),
)
)
...
results = self._format_results(
all_candidate_params, n_splits, all_out, all_more_results
)
return results
3. Best selection and refit:
# Select best
self.best_index_ = self._select_best_index(self.refit, refit_metric, results)
self.best_score_ = results[f"mean_test_{refit_metric}"][self.best_index_]
self.best_params_ = results["params"][self.best_index_]
# Refit on full data
self.best_estimator_ = clone(base_estimator).set_params(
**clone(self.best_params_, safe=False)
)
self.best_estimator_.fit(X, y, **routed_params.estimator.fit)
4. Store final results:
self.cv_results_ = results
self.n_splits_ = n_splits
return self
Usage Examples
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
search = GridSearchCV(
SVC(),
param_grid={'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']},
cv=5,
n_jobs=-1,
scoring='accuracy',
return_train_score=True,
)
search.fit(X, y)
# After fit, all result attributes are available
print(search.best_params_) # {'C': 1, 'kernel': 'linear'}
print(search.best_score_) # 0.98
print(search.best_estimator_) # SVC(C=1, kernel='linear')
print(search.n_splits_) # 5
print(search.refit_time_) # 0.002 (seconds)
Related Pages
- Principle:Scikit_learn_Scikit_learn_Search_Execution
- Environment:Scikit_learn_Scikit_learn_Python_Runtime_Environment
- Environment:Scikit_learn_Scikit_learn_OpenMP_Thread_Configuration
- Heuristic:Scikit_learn_Scikit_learn_Convergence_Warning_Handling
- Heuristic:Scikit_learn_Scikit_learn_N_Jobs_Parallelism_Tips
- Heuristic:Scikit_learn_Scikit_learn_Working_Memory_Tuning