Implementation:DistrictDataLabs Yellowbrick LearningCurve Visualizer

Knowledge Sources	Yellowbrick Yellowbrick Docs
Domains	Machine_Learning, Model_Selection, Visualization
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for diagnosing model performance as a function of training set size, provided by the Yellowbrick library.

Description

The LearningCurve visualizer wraps scikit-learn's sklearn.model_selection.learning_curve utility and produces a plot of training and cross-validated test scores against increasing training set sizes. For each training size, k-fold cross-validation is performed and the mean score is plotted with a shaded band representing one standard deviation of variability. Two curves are drawn: one for the training score and one for the cross-validation score.

The class extends ModelVisualizer from the Yellowbrick base module. When fit(X, y) is called, the visualizer delegates to scikit-learn's learning_curve function, passing parameters such as train_sizes, cv, scoring, and optional flags for incremental learning and shuffling. The resulting score arrays are stored as attributes, their means and standard deviations are computed, and draw() renders the plot. The default train_sizes parameter is np.linspace(0.1, 1.0, 5), which evaluates at 10%, 32.5%, 55%, 77.5%, and 100% of the available training data.

Usage

Use this visualizer when you want to determine whether your model would benefit from more training data or whether it has reached a performance plateau. It works with any scikit-learn estimator that implements fit and predict.

Code Reference

Source Location

Repository: yellowbrick
File: yellowbrick/model_selection/learning_curve.py
Class Lines: L38-305 (class), L171-185 (__init__), L210-262 (fit)
Quick Method Lines: L313-448

Signature

class LearningCurve(ModelVisualizer):
    def __init__(
        self,
        estimator,
        ax=None,
        groups=None,
        train_sizes=DEFAULT_TRAIN_SIZES,
        cv=None,
        scoring=None,
        exploit_incremental_learning=False,
        n_jobs=1,
        pre_dispatch="all",
        shuffle=False,
        random_state=None,
        **kwargs
    ):

Import

from yellowbrick.model_selection import LearningCurve

I/O Contract

Inputs

Name	Type	Required	Description
estimator	scikit-learn estimator	Yes	An object implementing `fit` and `predict`. Cloned for each validation.
ax	matplotlib.Axes	No	The axes object to plot the figure on.
groups	array-like, shape (n_samples,)	No	Group labels for samples used in train/test splitting.
train_sizes	array-like, shape (n_ticks,)	No	Relative or absolute training set sizes. Default: `np.linspace(0.1, 1.0, 5)`.
cv	int, CV generator, or iterable	No	Cross-validation splitting strategy. Default: None (3-fold).
scoring	string, callable, or None	No	Scoring metric. Default: None (estimator's default scorer).
exploit_incremental_learning	boolean	No	If True, uses incremental learning to speed up fitting. Default: False.
n_jobs	integer	No	Number of parallel jobs. Default: 1.
pre_dispatch	integer or string	No	Number of predispatched jobs. Default: "all".
shuffle	boolean	No	Whether to shuffle training data before taking prefixes. Default: False.
random_state	int, RandomState, or None	No	Seed for random number generator when shuffle is True. Default: None.

The fit(X, y) method accepts:

Name	Type	Required	Description
X	array-like, shape (n_samples, n_features)	Yes	Training feature matrix.
y	array-like, shape (n_samples,)	No	Target values. None for unsupervised learning.

Outputs

Name	Type	Description
train_sizes_	array, shape (n_unique_ticks,), dtype int	Actual numbers of training examples used (duplicates removed).
train_scores_	array, shape (n_ticks, n_cv_folds)	Raw scores on training sets for each training size and fold.
train_scores_mean_	array, shape (n_ticks,)	Mean training score for each training size.
train_scores_std_	array, shape (n_ticks,)	Standard deviation of training scores for each training size.
test_scores_	array, shape (n_ticks, n_cv_folds)	Raw scores on test sets for each training size and fold.
test_scores_mean_	array, shape (n_ticks,)	Mean cross-validated test score for each training size.
test_scores_std_	array, shape (n_ticks,)	Standard deviation of cross-validated test scores for each training size.

Usage Examples

Basic Usage

from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import LearningCurve

# Create and fit the visualizer
viz = LearningCurve(GaussianNB(), cv=5, scoring="accuracy")
viz.fit(X_train, y_train)
viz.show()

Quick Method

from yellowbrick.model_selection import learning_curve
from sklearn.naive_bayes import GaussianNB

learning_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy")

Related Pages

Implements Principle

Principle:DistrictDataLabs_Yellowbrick_Learning_Curve_Analysis

Requires Environment

Environment:DistrictDataLabs_Yellowbrick_Python_Scikit_Learn_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment