Implementation:DistrictDataLabs Yellowbrick LearningCurve Visualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Selection, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for diagnosing model performance as a function of training set size, provided by the Yellowbrick library.
Description
The LearningCurve visualizer wraps scikit-learn's sklearn.model_selection.learning_curve utility and produces a plot of training and cross-validated test scores against increasing training set sizes. For each training size, k-fold cross-validation is performed and the mean score is plotted with a shaded band representing one standard deviation of variability. Two curves are drawn: one for the training score and one for the cross-validation score.
The class extends ModelVisualizer from the Yellowbrick base module. When fit(X, y) is called, the visualizer delegates to scikit-learn's learning_curve function, passing parameters such as train_sizes, cv, scoring, and optional flags for incremental learning and shuffling. The resulting score arrays are stored as attributes, their means and standard deviations are computed, and draw() renders the plot. The default train_sizes parameter is np.linspace(0.1, 1.0, 5), which evaluates at 10%, 32.5%, 55%, 77.5%, and 100% of the available training data.
Usage
Use this visualizer when you want to determine whether your model would benefit from more training data or whether it has reached a performance plateau. It works with any scikit-learn estimator that implements fit and predict.
Code Reference
Source Location
- Repository: yellowbrick
- File: yellowbrick/model_selection/learning_curve.py
- Class Lines: L38-305 (class), L171-185 (__init__), L210-262 (fit)
- Quick Method Lines: L313-448
Signature
class LearningCurve(ModelVisualizer):
def __init__(
self,
estimator,
ax=None,
groups=None,
train_sizes=DEFAULT_TRAIN_SIZES,
cv=None,
scoring=None,
exploit_incremental_learning=False,
n_jobs=1,
pre_dispatch="all",
shuffle=False,
random_state=None,
**kwargs
):
Import
from yellowbrick.model_selection import LearningCurve
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | scikit-learn estimator | Yes | An object implementing fit and predict. Cloned for each validation.
|
| ax | matplotlib.Axes | No | The axes object to plot the figure on. |
| groups | array-like, shape (n_samples,) | No | Group labels for samples used in train/test splitting. |
| train_sizes | array-like, shape (n_ticks,) | No | Relative or absolute training set sizes. Default: np.linspace(0.1, 1.0, 5).
|
| cv | int, CV generator, or iterable | No | Cross-validation splitting strategy. Default: None (3-fold). |
| scoring | string, callable, or None | No | Scoring metric. Default: None (estimator's default scorer). |
| exploit_incremental_learning | boolean | No | If True, uses incremental learning to speed up fitting. Default: False. |
| n_jobs | integer | No | Number of parallel jobs. Default: 1. |
| pre_dispatch | integer or string | No | Number of predispatched jobs. Default: "all". |
| shuffle | boolean | No | Whether to shuffle training data before taking prefixes. Default: False. |
| random_state | int, RandomState, or None | No | Seed for random number generator when shuffle is True. Default: None. |
The fit(X, y) method accepts:
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like, shape (n_samples, n_features) | Yes | Training feature matrix. |
| y | array-like, shape (n_samples,) | No | Target values. None for unsupervised learning. |
Outputs
| Name | Type | Description |
|---|---|---|
| train_sizes_ | array, shape (n_unique_ticks,), dtype int | Actual numbers of training examples used (duplicates removed). |
| train_scores_ | array, shape (n_ticks, n_cv_folds) | Raw scores on training sets for each training size and fold. |
| train_scores_mean_ | array, shape (n_ticks,) | Mean training score for each training size. |
| train_scores_std_ | array, shape (n_ticks,) | Standard deviation of training scores for each training size. |
| test_scores_ | array, shape (n_ticks, n_cv_folds) | Raw scores on test sets for each training size and fold. |
| test_scores_mean_ | array, shape (n_ticks,) | Mean cross-validated test score for each training size. |
| test_scores_std_ | array, shape (n_ticks,) | Standard deviation of cross-validated test scores for each training size. |
Usage Examples
Basic Usage
from sklearn.naive_bayes import GaussianNB
from yellowbrick.model_selection import LearningCurve
# Create and fit the visualizer
viz = LearningCurve(GaussianNB(), cv=5, scoring="accuracy")
viz.fit(X_train, y_train)
viz.show()
Quick Method
from yellowbrick.model_selection import learning_curve
from sklearn.naive_bayes import GaussianNB
learning_curve(GaussianNB(), X_train, y_train, cv=5, scoring="accuracy")