Principle:Scikit learn Scikit learn Learning Curve Analysis

Metadata

Domains: Statistics, Model_Evaluation
Sources: scikit-learn documentation, "The Elements of Statistical Learning" Hastie et al., "Learning from Data" Abu-Mostafa et al.
Last Updated: 2026-02-08 15:00 GMT

Overview

A diagnostic technique that plots model performance as a function of training set size to detect underfitting or overfitting.

Learning curve analysis evaluates how a model's training and validation performance change as the number of training samples increases. By examining the shape and convergence behavior of these curves, practitioners can diagnose whether a model suffers from high bias (underfitting), high variance (overfitting), or whether it would benefit from additional training data.

Description

What learning curves show:

A learning curve consists of two lines plotted against the number of training samples:

Training score curve: The model's performance on the training data used to fit it. This score typically starts high (perfect or near-perfect when the training set is tiny) and decreases as the training set grows (because larger datasets are harder to memorize).
Validation score curve: The model's performance on held-out validation data. This score typically starts low (poor generalization from very few training samples) and increases as the training set grows (more data improves the model's ability to generalize).

Diagnosing underfitting and overfitting:

The relationship between the two curves reveals the model's bias-variance characteristics:

Underfitting (high bias): Both training and validation curves converge to a low score. Adding more data will not help because the model is too simple to capture the underlying pattern. The gap between curves is small, but both are far from the desired performance level. Remedy: Use a more complex model, add features, or reduce regularization.

Overfitting (high variance): The training score remains high while the validation score is substantially lower. There is a large gap between the two curves. Adding more data may help by closing the gap (the validation curve has not yet plateaued). Remedy: Use a simpler model, add regularization, reduce features, or collect more training data.

Good fit: Both curves converge to a high score with a small gap between them. The model has sufficient complexity to capture the pattern and enough data to generalize well.

Practical considerations:

Learning curves are computed using cross-validation at each training set size, so they incorporate the variability of the evaluation procedure.
Plotting standard deviation bands around the mean curves helps assess the stability of the scores.
The choice of training set sizes should span a meaningful range -- from very small (to see early learning behavior) to the maximum available (to see whether the curves have plateaued).

Usage

Learning curve analysis should be used when:

You want to determine whether collecting more data would improve your model's performance.
You need to diagnose whether your model is underfitting or overfitting.
You are deciding between model complexity levels (e.g., choosing the depth of a decision tree, the number of layers in a neural network).
You want to understand the data efficiency of your model -- how much training data is needed to reach a target performance level.

Theoretical Basis

Training score behavior:

As the training set size n increases:

For very small n, the model can effectively memorize the training data, producing perfect or near-perfect training scores.
As n grows, the training data becomes more diverse and harder to fit exactly, so the training score decreases and eventually stabilizes.
The asymptotic training score reflects the model's capacity to fit the data-generating distribution.

Validation score behavior:

For very small n, the model learns a poor approximation of the true pattern, so validation performance is low.
As n grows, the model receives more representative training data and generalizes better, so the validation score increases.
The validation score eventually plateaus at a level determined by the model's capacity and the inherent noise in the data.

Convergence gap and bias/variance:

The gap between the asymptotic training score and validation score reflects the model's variance component. A large gap indicates that the model is fitting noise in the training data (high variance). A low asymptotic level for both curves indicates that the model cannot capture the signal (high bias).

The rate of convergence of the validation score indicates data efficiency. Models that converge quickly need less data; models that converge slowly (or have not converged given the available data) may benefit from more training samples.

Related Pages

Implementation:Scikit_learn_Scikit_learn_Learning_Curve

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment