Implementation:Scikit learn Scikit learn Learning Curve

Metadata

Domains: Statistics, Model_Evaluation
Source Files: sklearn/model_selection/_validation.py, sklearn/model_selection/_plot.py
Last Updated: 2026-02-08 15:00 GMT

Overview

Concrete tool for computing learning curves across training set sizes provided by scikit-learn. This implementation covers the learning_curve function, which computes cross-validated training and test scores at varying training set sizes, and the LearningCurveDisplay class, which provides a visualization API for plotting the results.

API Signatures

learning_curve

from sklearn.model_selection import learning_curve

learning_curve(
    estimator,
    X,
    y,
    *,
    groups=None,
    train_sizes=np.linspace(0.1, 1.0, 5),
    cv=None,
    scoring=None,
    exploit_incremental_learning=False,
    n_jobs=None,
    pre_dispatch="all",
    verbose=0,
    shuffle=False,
    random_state=None,
    error_score=np.nan,
    return_times=False,
    params=None,
)

Parameters:

estimator (object implementing 'fit' ) -- An object that is cloned for each validation. It must also implement predict unless scoring is a callable that does not rely on predict.
X (array-like or sparse matrix of shape (n_samples, n_features)) -- Training vector.
y (array-like of shape (n_samples,) or (n_samples, n_outputs) or None) -- Target variable; None for unsupervised learning.
groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
train_sizes (array-like of shape (n_ticks,), default=np.linspace(0.1, 1.0, 5)) -- Relative or absolute numbers of training examples. If dtype is float, treated as fractions of the maximum training set size (must be within (0, 1]). If dtype is int, treated as absolute sizes.
cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy. None defaults to 5-fold.
scoring (str or callable, default=None) -- Scoring method. None uses the estimator's default scorer.
exploit_incremental_learning (bool, default=False) -- If the estimator supports incremental learning, use it to speed up fitting for different training set sizes.
n_jobs (int, default=None) -- Number of parallel jobs.
pre_dispatch (int or str, default='all' ) -- Controls pre-dispatched parallel jobs.
verbose (int, default=0) -- Verbosity level.
shuffle (bool, default=False) -- Whether to shuffle training data before taking prefixes based on train_sizes.
random_state (int, RandomState instance or None, default=None) -- Used when shuffle=True for reproducibility.
error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs during fitting.
return_times (bool, default=False) -- Whether to return fit and score times.
params (dict, default=None) -- Parameters to pass to the fit method and the scorer. (Added in version 1.6.)

Returns:

train_sizes_abs (ndarray of shape (n_unique_ticks,)) -- Absolute numbers of training examples used.
train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
fit_times (ndarray of shape (n_ticks, n_cv_folds)) -- Fit times in seconds. Only present if return_times=True.
score_times (ndarray of shape (n_ticks, n_cv_folds)) -- Score times in seconds. Only present if return_times=True.

Example:

from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import learning_curve
import numpy as np

X, y = make_classification(n_samples=1000, random_state=42)
clf = DecisionTreeClassifier(random_state=42)

train_sizes, train_scores, test_scores = learning_curve(
    clf, X, y, cv=5,
    train_sizes=np.linspace(0.1, 1.0, 10)
)

# Compute mean and std across folds
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

LearningCurveDisplay

from sklearn.model_selection import LearningCurveDisplay

# Constructor (typically not called directly)
LearningCurveDisplay(
    *,
    train_sizes,
    train_scores,
    test_scores,
    score_name=None,
)

Constructor Parameters:

train_sizes (ndarray of shape (n_unique_ticks,)) -- Numbers of training examples used.
train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
score_name (str, default=None) -- The name of the score used to label the y-axis.

Attributes:

ax_ -- matplotlib Axes containing the learning curve.
figure_ -- matplotlib Figure containing the learning curve.
errorbar_ -- List of ErrorbarContainer objects when std_display_style="errorbar"; None otherwise.
lines_ -- List of Line2D objects when std_display_style="fill_between"; None otherwise.
fill_between_ -- List of PolyCollection objects when std_display_style="fill_between"; None otherwise.

LearningCurveDisplay.from_estimator (class method)

LearningCurveDisplay.from_estimator(
    estimator,
    X,
    y,
    *,
    groups=None,
    train_sizes=np.linspace(0.1, 1.0, 5),
    cv=None,
    scoring=None,
    exploit_incremental_learning=False,
    n_jobs=None,
    pre_dispatch="all",
    verbose=0,
    shuffle=False,
    random_state=None,
    error_score=np.nan,
    fit_params=None,
    ax=None,
    negate_score=False,
    score_name=None,
    score_type="both",
    std_display_style="fill_between",
    line_kw=None,
    fill_between_kw=None,
    errorbar_kw=None,
)

Key Parameters (beyond those shared with learning_curve):

fit_params (dict, default=None) -- Parameters to pass to the fit method of the estimator.
ax (matplotlib Axes, default=None) -- Axes to plot on. If None, a new figure and axes are created.
negate_score (bool, default=False) -- Whether to negate the scores. Useful for neg_* scoring metrics.
score_name (str, default=None) -- Custom name for the y-axis label. Overrides the name inferred from scoring.
score_type ("test", "train", or "both", default="both") -- Which score curves to plot.
std_display_style ("errorbar", "fill_between", or None, default="fill_between") -- How to display the standard deviation around the mean score.
line_kw (dict, default=None) -- Additional keyword arguments for plt.plot.
fill_between_kw (dict, default=None) -- Additional keyword arguments for plt.fill_between.
errorbar_kw (dict, default=None) -- Additional keyword arguments for plt.errorbar.

Returns:

display (LearningCurveDisplay) -- Object that stores computed values and the plot.

LearningCurveDisplay.plot (instance method)

display.plot(
    ax=None,
    *,
    negate_score=False,
    score_name=None,
    score_type="both",
    std_display_style="fill_between",
    line_kw=None,
    fill_between_kw=None,
    errorbar_kw=None,
)

Returns:

display (LearningCurveDisplay) -- The display object (self).

Examples

One-step visualization with from_estimator

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)

LearningCurveDisplay.from_estimator(tree, X, y, cv=5)
plt.show()

Two-step: compute then display

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay, learning_curve
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)

train_sizes, train_scores, test_scores = learning_curve(tree, X, y, cv=5)

display = LearningCurveDisplay(
    train_sizes=train_sizes,
    train_scores=train_scores,
    test_scores=test_scores,
    score_name="Score"
)
display.plot()
plt.show()

Using negate_score for loss metrics

import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import LearningCurveDisplay
from sklearn.linear_model import Ridge

X, y = load_diabetes(return_X_y=True)
ridge = Ridge(alpha=1.0)

LearningCurveDisplay.from_estimator(
    ridge, X, y, cv=5,
    scoring='neg_mean_squared_error',
    negate_score=True,
    score_name='Mean Squared Error'
)
plt.show()

Related Pages

Principle:Scikit_learn_Scikit_learn_Learning_Curve_Analysis

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment