Implementation:Scikit learn Scikit learn Learning Curve
Appearance
Metadata
- Domains: Statistics, Model_Evaluation
- Source Files:
sklearn/model_selection/_validation.py,sklearn/model_selection/_plot.py - Last Updated: 2026-02-08 15:00 GMT
Overview
Concrete tool for computing learning curves across training set sizes provided by scikit-learn. This implementation covers the learning_curve function, which computes cross-validated training and test scores at varying training set sizes, and the LearningCurveDisplay class, which provides a visualization API for plotting the results.
API Signatures
learning_curve
from sklearn.model_selection import learning_curve
learning_curve(
estimator,
X,
y,
*,
groups=None,
train_sizes=np.linspace(0.1, 1.0, 5),
cv=None,
scoring=None,
exploit_incremental_learning=False,
n_jobs=None,
pre_dispatch="all",
verbose=0,
shuffle=False,
random_state=None,
error_score=np.nan,
return_times=False,
params=None,
)
Parameters:
- estimator (object implementing 'fit' ) -- An object that is cloned for each validation. It must also implement
predictunlessscoringis a callable that does not rely onpredict. - X (array-like or sparse matrix of shape (n_samples, n_features)) -- Training vector.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs) or None) -- Target variable;
Nonefor unsupervised learning. - groups (array-like of shape (n_samples,), default=None) -- Group labels for Group CV splitters.
- train_sizes (array-like of shape (n_ticks,), default=np.linspace(0.1, 1.0, 5)) -- Relative or absolute numbers of training examples. If dtype is float, treated as fractions of the maximum training set size (must be within (0, 1]). If dtype is int, treated as absolute sizes.
- cv (int, CV splitter, or iterable, default=None) -- Cross-validation splitting strategy.
Nonedefaults to 5-fold. - scoring (str or callable, default=None) -- Scoring method.
Noneuses the estimator's default scorer. - exploit_incremental_learning (bool, default=False) -- If the estimator supports incremental learning, use it to speed up fitting for different training set sizes.
- n_jobs (int, default=None) -- Number of parallel jobs.
- pre_dispatch (int or str, default='all' ) -- Controls pre-dispatched parallel jobs.
- verbose (int, default=0) -- Verbosity level.
- shuffle (bool, default=False) -- Whether to shuffle training data before taking prefixes based on
train_sizes. - random_state (int, RandomState instance or None, default=None) -- Used when
shuffle=Truefor reproducibility. - error_score ("raise" or numeric, default=np.nan) -- Value assigned if an error occurs during fitting.
- return_times (bool, default=False) -- Whether to return fit and score times.
- params (dict, default=None) -- Parameters to pass to the
fitmethod and the scorer. (Added in version 1.6.)
Returns:
- train_sizes_abs (ndarray of shape (n_unique_ticks,)) -- Absolute numbers of training examples used.
- train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
- test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
- fit_times (ndarray of shape (n_ticks, n_cv_folds)) -- Fit times in seconds. Only present if
return_times=True. - score_times (ndarray of shape (n_ticks, n_cv_folds)) -- Score times in seconds. Only present if
return_times=True.
Example:
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import learning_curve
import numpy as np
X, y = make_classification(n_samples=1000, random_state=42)
clf = DecisionTreeClassifier(random_state=42)
train_sizes, train_scores, test_scores = learning_curve(
clf, X, y, cv=5,
train_sizes=np.linspace(0.1, 1.0, 10)
)
# Compute mean and std across folds
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)
LearningCurveDisplay
from sklearn.model_selection import LearningCurveDisplay
# Constructor (typically not called directly)
LearningCurveDisplay(
*,
train_sizes,
train_scores,
test_scores,
score_name=None,
)
Constructor Parameters:
- train_sizes (ndarray of shape (n_unique_ticks,)) -- Numbers of training examples used.
- train_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on training sets.
- test_scores (ndarray of shape (n_ticks, n_cv_folds)) -- Scores on test sets.
- score_name (str, default=None) -- The name of the score used to label the y-axis.
Attributes:
ax_-- matplotlib Axes containing the learning curve.figure_-- matplotlib Figure containing the learning curve.errorbar_-- List ofErrorbarContainerobjects whenstd_display_style="errorbar";Noneotherwise.lines_-- List ofLine2Dobjects whenstd_display_style="fill_between";Noneotherwise.fill_between_-- List ofPolyCollectionobjects whenstd_display_style="fill_between";Noneotherwise.
LearningCurveDisplay.from_estimator (class method)
LearningCurveDisplay.from_estimator(
estimator,
X,
y,
*,
groups=None,
train_sizes=np.linspace(0.1, 1.0, 5),
cv=None,
scoring=None,
exploit_incremental_learning=False,
n_jobs=None,
pre_dispatch="all",
verbose=0,
shuffle=False,
random_state=None,
error_score=np.nan,
fit_params=None,
ax=None,
negate_score=False,
score_name=None,
score_type="both",
std_display_style="fill_between",
line_kw=None,
fill_between_kw=None,
errorbar_kw=None,
)
Key Parameters (beyond those shared with learning_curve):
- fit_params (dict, default=None) -- Parameters to pass to the fit method of the estimator.
- ax (matplotlib Axes, default=None) -- Axes to plot on. If
None, a new figure and axes are created. - negate_score (bool, default=False) -- Whether to negate the scores. Useful for
neg_*scoring metrics. - score_name (str, default=None) -- Custom name for the y-axis label. Overrides the name inferred from
scoring. - score_type ("test", "train", or "both", default="both") -- Which score curves to plot.
- std_display_style ("errorbar", "fill_between", or None, default="fill_between") -- How to display the standard deviation around the mean score.
- line_kw (dict, default=None) -- Additional keyword arguments for
plt.plot. - fill_between_kw (dict, default=None) -- Additional keyword arguments for
plt.fill_between. - errorbar_kw (dict, default=None) -- Additional keyword arguments for
plt.errorbar.
Returns:
- display (
LearningCurveDisplay) -- Object that stores computed values and the plot.
LearningCurveDisplay.plot (instance method)
display.plot(
ax=None,
*,
negate_score=False,
score_name=None,
score_type="both",
std_display_style="fill_between",
line_kw=None,
fill_between_kw=None,
errorbar_kw=None,
)
Returns:
- display (
LearningCurveDisplay) -- The display object (self).
Examples
One-step visualization with from_estimator
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay
from sklearn.tree import DecisionTreeClassifier
X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)
LearningCurveDisplay.from_estimator(tree, X, y, cv=5)
plt.show()
Two-step: compute then display
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import LearningCurveDisplay, learning_curve
from sklearn.tree import DecisionTreeClassifier
X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(random_state=0)
train_sizes, train_scores, test_scores = learning_curve(tree, X, y, cv=5)
display = LearningCurveDisplay(
train_sizes=train_sizes,
train_scores=train_scores,
test_scores=test_scores,
score_name="Score"
)
display.plot()
plt.show()
Using negate_score for loss metrics
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import LearningCurveDisplay
from sklearn.linear_model import Ridge
X, y = load_diabetes(return_X_y=True)
ridge = Ridge(alpha=1.0)
LearningCurveDisplay.from_estimator(
ridge, X, y, cv=5,
scoring='neg_mean_squared_error',
negate_score=True,
score_name='Mean Squared Error'
)
plt.show()
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment