Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn Scikit learn GridSearchCV Init

From Leeroopedia


Template:Implementation Metadata

Overview

Concrete tool for exhaustive hyperparameter search with cross-validation provided by scikit-learn.

GridSearchCV is a class that inherits from BaseSearchCV and performs exhaustive search over a specified parameter grid. It evaluates every combination of hyperparameters using cross-validation, selects the best configuration, and optionally refits a final estimator on the full dataset.

Code Reference

Class Signature

class GridSearchCV(BaseSearchCV):
    def __init__(
        self,
        estimator,
        param_grid,
        *,
        scoring=None,
        n_jobs=None,
        refit=True,
        cv=None,
        verbose=0,
        pre_dispatch="2*n_jobs",
        error_score=np.nan,
        return_train_score=False,
    ):

Constructor Parameters

Parameter Type Default Description
estimator estimator object (required) The base estimator to fit. Must implement a fit method. If scoring is not provided, the estimator must also have a score method.
param_grid dict or list of dicts (required) Dictionary mapping parameter names (strings) to lists of values to try. A list of dicts specifies multiple sub-grids, allowing disjoint regions of the parameter space to be explored.
scoring str, callable, list, tuple, dict, or None None Strategy to evaluate model performance. A string names a built-in scorer; a callable must accept (estimator, X, y); a list/dict enables multi-metric evaluation. None uses the estimator's default score method.
n_jobs int or None None Number of parallel jobs. None means 1 (no parallelism). -1 uses all processors.
refit bool, str, or callable True Whether to refit the best estimator on the full dataset after search. For multi-metric scoring, pass a string naming the scorer to optimize, or a callable that receives cv_results_ and returns the best_index_.
cv int, CV splitter, iterable, or None None Cross-validation splitting strategy. None defaults to 5-fold. An integer specifies the number of folds. StratifiedKFold is used automatically for classifiers.
verbose int 0 Controls verbosity of output during fitting. Higher values produce more detailed logging (0=silent, >=1=summary, >=2=per-fold timing, >=3=fold details, >=10=per-candidate messages).
pre_dispatch int or str "2*n_jobs" Controls the number of jobs dispatched during parallel execution. Reducing this helps control memory consumption.
error_score "raise" or numeric np.nan Value assigned to the score when fitting fails. If "raise", the error is propagated. If numeric, a FitFailedWarning is issued.
return_train_score bool False Whether to include training scores in cv_results_. Useful for diagnosing overfitting but adds computational cost.

Key Attributes After Fitting

Attribute Type Description
cv_results_ dict of numpy arrays Full results dictionary with per-split scores, mean scores, standard deviations, rankings, timing, and parameter values.
best_estimator_ estimator The estimator refitted on the full dataset with the best parameters. Only available when refit=True.
best_score_ float Mean cross-validated score of the best configuration.
best_params_ dict The parameter setting that achieved the best score.
best_index_ int Index into cv_results_ arrays for the best candidate.
n_splits_ int The number of cross-validation splits.
refit_time_ float Time in seconds for the refit step. Only available when refit is not False.

Internal Search Mechanism

The core search logic is implemented in a single method:

def _run_search(self, evaluate_candidates):
    """Search all candidates in param_grid"""
    evaluate_candidates(ParameterGrid(self.param_grid))

This delegates all candidates at once to the evaluate_candidates callback defined in BaseSearchCV.fit, which handles parallel fitting and scoring across all folds.

Usage Examples

Basic Grid Search

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]}
svc = svm.SVC()

clf = GridSearchCV(svc, parameters, cv=5, scoring='accuracy')
clf.fit(iris.data, iris.target)

print(clf.best_params_)    # e.g., {'C': 1, 'kernel': 'linear'}
print(clf.best_score_)     # e.g., 0.98

Multi-metric Evaluation

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

scoring = {'accuracy': 'accuracy', 'f1_macro': 'f1_macro'}

clf = GridSearchCV(
    SVC(),
    param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf']},
    scoring=scoring,
    refit='accuracy',  # Refit using the best accuracy
    cv=5,
)
clf.fit(X, y)

Randomized Alternative: RandomizedSearchCV

When the parameter space is large, RandomizedSearchCV provides a more efficient alternative. Its constructor adds param_distributions (which can include scipy.stats distributions), n_iter (number of configurations to sample), and random_state:

class RandomizedSearchCV(BaseSearchCV):
    def __init__(
        self,
        estimator,
        param_distributions,
        *,
        n_iter=10,
        scoring=None,
        n_jobs=None,
        refit=True,
        cv=None,
        verbose=0,
        pre_dispatch="2*n_jobs",
        random_state=None,
        error_score=np.nan,
        return_train_score=False,
    ):

The key difference is that RandomizedSearchCV samples n_iter configurations from param_distributions via ParameterSampler, rather than evaluating all combinations from a grid.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment