Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick LearningCurve Visualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Model_Selection, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for diagnosing model performance as a function of training set size, provided by the Yellowbrick library.

Description

The LearningCurve visualizer wraps scikit-learn's sklearn.model_selection.learning_curve utility and produces a plot of training and cross-validated test scores against increasing training set sizes. For each training size, k-fold cross-validation is performed and the mean score is plotted with a shaded band representing one standard deviation of variability. Two curves are drawn: one for the training score and one for the cross-validation score.

The class extends ModelVisualizer from the Yellowbrick base module. When fit(X, y) is called, the visualizer delegates to scikit-learn's learning_curve function, passing parameters such as train_sizes, cv, scoring, and optional flags for incremental learning and shuffling. The resulting score arrays are stored as attributes, their means and standard deviations are computed, and draw() renders the plot. The default train_sizes parameter is np.linspace(0.1, 1.0, 5), which evaluates at 10%, 32.5%, 55%, 77.5%, and 100% of the available training data.

Usage

Use this visualizer when you want to determine whether your model would benefit from more training data or whether it has reached a performance plateau. It works with any scikit-learn estimator that implements fit and predict.

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/model_selection/learning_curve.py
  • Class Lines: L38-305 (class), L171-185 (__init__), L210-262 (fit)
  • Quick Method Lines: L313-448

Signature

class LearningCurve(ModelVisualizer):
    def __init__(
        self,
        estimator,
        ax=None,
        groups=None,
        train_sizes=DEFAULT_TRAIN_SIZES,
        cv=None,
        scoring=None,
        exploit_incremental_learning=False,
        n_jobs=1,
        pre_dispatch="all",
        shuffle=False,
        random_state=None,
        **kwargs
    ):

Import

from yellowbrick.model_selection import LearningCurve

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn estimator Yes An object implementing fit and predict. Cloned for each validation.
ax matplotlib.Axes No The axes object to plot the figure on.
groups array-like, shape (n_samples,) No Group labels for samples used in train/test splitting.
train_sizes array-like, shape (n_ticks,) No Relative or absolute training set sizes. Default: np.linspace(0.1, 1.0, 5).
cv int, CV generator, or iterable No Cross-validation splitting strategy. Default: None (3-fold).
scoring string, callable, or None No Scoring metric. Default: None (estimator's default scorer).
exploit_incremental_learning boolean No If True, uses incremental learning to speed up fitting. Default: False.
n_jobs integer No Number of parallel jobs. Default: 1.
pre_dispatch integer or string No Number of predispatched jobs. Default: "all".
shuffle boolean No Whether to shuffle training data before taking prefixes. Default: False.
random_state int, RandomState, or None No Seed for random number generator when shuffle is True. Default: None.

The fit(X, y) method accepts:

Name Type Required Description
X array-like, shape (n_samples, n_features) Yes Training feature matrix.
y array-like, shape (n_samples,) No Target values. None for unsupervised learning.

Outputs

Name Type Description
train_sizes_ array, shape (n_unique_ticks,), dtype int Actual numbers of training examples used (duplicates removed).
train_scores_ array, shape (n_ticks, n_cv_folds) Raw scores on training sets for each training size and fold.
train_scores_mean_ array, shape (n_ticks,) Mean training score for each training size.
train_scores_std_ array, shape (n_ticks,) Standard deviation of training scores for each training size.
test_scores_ array, shape (n_ticks, n_cv_folds) Raw scores on test sets for each training size and fold.
test_scores_mean_ array, shape (n_ticks,) Mean cross-validated test score for each training size.
test_scores_std_ array, shape (n_ticks,) Standard deviation of cross-validated test scores for each training size.

Usage Examples

Basic Usage

from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import LearningCurve

# Create and fit the visualizer
viz = LearningCurve(GaussianNB(), cv=5, scoring="accuracy")
viz.fit(X_train, y_train)
viz.show()

Quick Method

from yellowbrick.model_selection import learning_curve
from sklearn.naive_bayes import GaussianNB

learning_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment