Implementation:Scikit learn Scikit learn GridSearchCV Init

Overview

Concrete tool for exhaustive hyperparameter search with cross-validation provided by scikit-learn.

GridSearchCV is a class that inherits from BaseSearchCV and performs exhaustive search over a specified parameter grid. It evaluates every combination of hyperparameters using cross-validation, selects the best configuration, and optionally refits a final estimator on the full dataset.

Code Reference

Class Signature

class GridSearchCV(BaseSearchCV):
    def __init__(
        self,
        estimator,
        param_grid,
        *,
        scoring=None,
        n_jobs=None,
        refit=True,
        cv=None,
        verbose=0,
        pre_dispatch="2*n_jobs",
        error_score=np.nan,
        return_train_score=False,
    ):

Constructor Parameters

Parameter	Type	Default	Description
`estimator`	estimator object	(required)	The base estimator to fit. Must implement a `fit` method. If `scoring` is not provided, the estimator must also have a `score` method.
`param_grid`	`dict` or `list of dicts`	(required)	Dictionary mapping parameter names (strings) to lists of values to try. A list of dicts specifies multiple sub-grids, allowing disjoint regions of the parameter space to be explored.
`scoring`	`str`, `callable`, `list`, `tuple`, `dict`, or `None`	`None`	Strategy to evaluate model performance. A string names a built-in scorer; a callable must accept `(estimator, X, y)`; a list/dict enables multi-metric evaluation. `None` uses the estimator's default `score` method.
`n_jobs`	`int` or `None`	`None`	Number of parallel jobs. `None` means 1 (no parallelism). `-1` uses all processors.
`refit`	`bool`, `str`, or `callable`	`True`	Whether to refit the best estimator on the full dataset after search. For multi-metric scoring, pass a string naming the scorer to optimize, or a callable that receives `cv_results_` and returns the `best_index_`.
`cv`	`int`, CV splitter, iterable, or `None`	`None`	Cross-validation splitting strategy. `None` defaults to 5-fold. An integer specifies the number of folds. `StratifiedKFold` is used automatically for classifiers.
`verbose`	`int`	`0`	Controls verbosity of output during fitting. Higher values produce more detailed logging (0=silent, >=1=summary, >=2=per-fold timing, >=3=fold details, >=10=per-candidate messages).
`pre_dispatch`	`int` or `str`	`"2*n_jobs"`	Controls the number of jobs dispatched during parallel execution. Reducing this helps control memory consumption.
`error_score`	`"raise"` or numeric	`np.nan`	Value assigned to the score when fitting fails. If `"raise"`, the error is propagated. If numeric, a `FitFailedWarning` is issued.
`return_train_score`	`bool`	`False`	Whether to include training scores in `cv_results_`. Useful for diagnosing overfitting but adds computational cost.

Key Attributes After Fitting

Attribute	Type	Description
`cv_results_`	`dict` of numpy arrays	Full results dictionary with per-split scores, mean scores, standard deviations, rankings, timing, and parameter values.
`best_estimator_`	estimator	The estimator refitted on the full dataset with the best parameters. Only available when `refit=True`.
`best_score_`	`float`	Mean cross-validated score of the best configuration.
`best_params_`	`dict`	The parameter setting that achieved the best score.
`best_index_`	`int`	Index into `cv_results_` arrays for the best candidate.
`n_splits_`	`int`	The number of cross-validation splits.
`refit_time_`	`float`	Time in seconds for the refit step. Only available when `refit` is not `False`.

Internal Search Mechanism

The core search logic is implemented in a single method:

def _run_search(self, evaluate_candidates):
    """Search all candidates in param_grid"""
    evaluate_candidates(ParameterGrid(self.param_grid))

This delegates all candidates at once to the evaluate_candidates callback defined in BaseSearchCV.fit, which handles parallel fitting and scoring across all folds.

Usage Examples

Basic Grid Search

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]}
svc = svm.SVC()

clf = GridSearchCV(svc, parameters, cv=5, scoring='accuracy')
clf.fit(iris.data, iris.target)

print(clf.best_params_)    # e.g., {'C': 1, 'kernel': 'linear'}
print(clf.best_score_)     # e.g., 0.98

Multi-metric Evaluation

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

scoring = {'accuracy': 'accuracy', 'f1_macro': 'f1_macro'}

clf = GridSearchCV(
    SVC(),
    param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf']},
    scoring=scoring,
    refit='accuracy',  # Refit using the best accuracy
    cv=5,
)
clf.fit(X, y)

Randomized Alternative: RandomizedSearchCV

When the parameter space is large, RandomizedSearchCV provides a more efficient alternative. Its constructor adds param_distributions (which can include scipy.stats distributions), n_iter (number of configurations to sample), and random_state:

class RandomizedSearchCV(BaseSearchCV):
    def __init__(
        self,
        estimator,
        param_distributions,
        *,
        n_iter=10,
        scoring=None,
        n_jobs=None,
        refit=True,
        cv=None,
        verbose=0,
        pre_dispatch="2*n_jobs",
        random_state=None,
        error_score=np.nan,
        return_train_score=False,
    ):

The key difference is that RandomizedSearchCV samples n_iter configurations from param_distributions via ParameterSampler, rather than evaluating all combinations from a grid.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment