Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn Learning Curve

From Leeroopedia
Revision as of 16:35, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Scikit_learn_Scikit_learn_Learning_Curve.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Metadata

  • Domains: Statistics, Model_Evaluation
  • Source Files: sklearn/model_selection/_validation.py, sklearn/model_selection/_plot.py
  • Last Updated: 2026-02-08 15:00 GMT

Overview

Concrete tool for computing learning curves across training set sizes provided by scikit-learn. This implementation covers the learning_curve function, which computes cross-validated training and test scores at varying training set sizes, and the LearningCurveDisplay class, which provides a visualization API for plotting the results.

API Signatures

learning_curve

from sklearn.model_selection import learning_curve

learning_curve(
    estimator,
    X,
    y,
    *,
    groups=None,
    train_sizes=np.linspace(0.1, 1.0, 5),
    cv=None,
    scoring=None,
    exploit_incremental_learning=False,
    n_jobs=None,
    pre_dispatch="all",
    verbose=0,
    shuffle=False,
    random_state=None,
    error_score=np.nan,
    return_times=False,
    params=None,
)

Parameters:

  • estimator (object implementing 'fit' ) -- An object that is cloned for each validation. It must also implement predict unless scoring is a callable that does not rely on predict.
  • X (array-like or sparse matrix of shape (n_samples, n_features)) -- Training vector.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs) or None) -- Target variable; None for unsupervised learning.
  • groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
  • train_sizes (array-like of shape (n_ticks,), default=np.linspace(0.1, 1.0, 5)) -- Relative or absolute numbers of training examples. If dtype is float, treated as fractions of the maximum training set size (must be within (0, 1]). If dtype is int, treated as absolute sizes.
  • cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold.
  • scoring (str or callable, default=None) -- Scoring method. None uses the estimator's default scorer.
  • exploit_incremental_learning (bool, default=False) -- If the estimator supports incremental learning, use it to speed up fitting for different training set sizes.
  • n_jobs (int, default=None) -- Number of parallel jobs.
  • pre_dispatch (int or str, default='all' ) -- Controls pre-dispatched parallel jobs.
  • verbose (int, default=0) -- Verbosity level.
  • shuffle (bool, default=False) -- Whether to shuffle training data before taking prefixes based on train_sizes.
  • random_state (int, RandomState instance or None, default=None) -- Used when shuffle=True for reproducibility.
  • error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs during fitting.
  • return_times (bool, default=False) -- Whether to return fit and score times.
  • params (dict, default=None) -- Parameters to pass to the fit method and the scorer. (Added in version 1.6.)

Returns:

  • train_sizes_abs (ndarray of shape (n_unique_ticks,)) -- Absolute numbers of training examples used.
  • train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
  • test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
  • fit_times (ndarray of shape (n_ticks, n_cv_folds)) -- Fit times in seconds. Only present if return_times=True.
  • score_times (ndarray of shape (n_ticks, n_cv_folds)) -- Score times in seconds. Only present if return_times=True.

Example:

from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import learning_curve
import numpy as np

X, y = make_classification(n_samples=1000, random_state=42)
clf = DecisionTreeClassifier(random_state=42)

train_sizes, train_scores, test_scores = learning_curve(
    clf, X, y, cv=5,
    train_sizes=np.linspace(0.1, 1.0, 10)
)

# Compute mean and std across folds
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

LearningCurveDisplay

from sklearn.model_selection import LearningCurveDisplay

# Constructor (typically not called directly)
LearningCurveDisplay(
    *,
    train_sizes,
    train_scores,
    test_scores,
    score_name=None,
)

Constructor Parameters:

  • train_sizes (ndarray of shape (n_unique_ticks,)) -- Numbers of training examples used.
  • train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
  • test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
  • score_name (str, default=None) -- The name of the score used to label the y-axis.

Attributes:

  • ax_ -- matplotlib Axes containing the learning curve.
  • figure_ -- matplotlib Figure containing the learning curve.
  • errorbar_ -- List of ErrorbarContainer objects when std_display_style="errorbar"; None otherwise.
  • lines_ -- List of Line2D objects when std_display_style="fill_between"; None otherwise.
  • fill_between_ -- List of PolyCollection objects when std_display_style="fill_between"; None otherwise.

LearningCurveDisplay.from_estimator (class method)

LearningCurveDisplay.from_estimator(
    estimator,
    X,
    y,
    *,
    groups=None,
    train_sizes=np.linspace(0.1, 1.0, 5),
    cv=None,
    scoring=None,
    exploit_incremental_learning=False,
    n_jobs=None,
    pre_dispatch="all",
    verbose=0,
    shuffle=False,
    random_state=None,
    error_score=np.nan,
    fit_params=None,
    ax=None,
    negate_score=False,
    score_name=None,
    score_type="both",
    std_display_style="fill_between",
    line_kw=None,
    fill_between_kw=None,
    errorbar_kw=None,
)

Key Parameters (beyond those shared with learning_curve):

  • fit_params (dict, default=None) -- Parameters to pass to the fit method of the estimator.
  • ax (matplotlib Axes, default=None) -- Axes to plot on. If None, a new figure and axes are created.
  • negate_score (bool, default=False) -- Whether to negate the scores. Useful for neg_* scoring metrics.
  • score_name (str, default=None) -- Custom name for the y-axis label. Overrides the name inferred from scoring.
  • score_type ("test", "train", or "both", default="both") -- Which score curves to plot.
  • std_display_style ("errorbar", "fill_between", or None, default="fill_between") -- How to display the standard deviation around the mean score.
  • line_kw (dict, default=None) -- Additional keyword arguments for plt.plot.
  • fill_between_kw (dict, default=None) -- Additional keyword arguments for plt.fill_between.
  • errorbar_kw (dict, default=None) -- Additional keyword arguments for plt.errorbar.

Returns:

  • display (LearningCurveDisplay) -- Object that stores computed values and the plot.

LearningCurveDisplay.plot (instance method)

display.plot(
    ax=None,
    *,
    negate_score=False,
    score_name=None,
    score_type="both",
    std_display_style="fill_between",
    line_kw=None,
    fill_between_kw=None,
    errorbar_kw=None,
)

Returns:

  • display (LearningCurveDisplay) -- The display object (self).

Examples

One-step visualization with from_estimator

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)

LearningCurveDisplay.from_estimator(tree, X, y, cv=5)
plt.show()

Two-step: compute then display

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay, learning_curve
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)

train_sizes, train_scores, test_scores = learning_curve(tree, X, y, cv=5)

display = LearningCurveDisplay(
    train_sizes=train_sizes,
    train_scores=train_scores,
    test_scores=test_scores,
    score_name="Score"
)
display.plot()
plt.show()

Using negate_score for loss metrics

import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import LearningCurveDisplay
from sklearn.linear_model import Ridge

X, y = load_diabetes(return_X_y=True)
ridge = Ridge(alpha=1.0)

LearningCurveDisplay.from_estimator(
    ridge, X, y, cv=5,
    scoring='neg_mean_squared_error',
    negate_score=True,
    score_name='Mean Squared Error'
)
plt.show()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment