Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DistrictDataLabs Yellowbrick Prediction Error Analysis

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Regression, Model_Evaluation
Last Updated 2026-02-08 00:00 GMT

Overview

Prediction error analysis is a diagnostic technique that evaluates regression model accuracy by plotting predicted values against actual observed values and comparing the result to the identity line.

Description

In a prediction error plot, the actual target values y are placed on the horizontal axis and the corresponding model predictions y^ are placed on the vertical axis. Each observation becomes a single point in this scatter plot. If the model were perfect, every point would fall exactly on the 45-degree identity line y^=y. Deviations from this line reveal the nature and magnitude of prediction errors.

By overlaying both the identity line and a best-fit line through the scatter, an analyst can quickly diagnose systematic bias. When the best-fit line closely follows the identity line, the model is well-calibrated. When the best-fit line diverges, the slope and intercept reveal whether the model is systematically over-predicting or under-predicting. For example, a best-fit line with a slope less than 1 indicates that the model under-predicts high values and over-predicts low values, a phenomenon known as regression toward the mean. Conversely, a slope greater than 1 suggests the opposite pattern.

The prediction error plot also helps detect heteroscedasticity: if the scatter of points fans out or narrows across the range of actual values, the variance of the model errors is not constant. Clusters or gaps in the plot can reveal regions of the target domain where the model performs well or poorly, guiding targeted model improvement.

Usage

Prediction error analysis is most useful when:

  • Assessing whether a regression model is well-calibrated across the full range of the target variable
  • Diagnosing systematic over-prediction or under-prediction bias
  • Detecting heteroscedasticity (non-constant error variance) across different regions of the target domain
  • Comparing multiple models by overlaying their prediction error plots
  • Communicating model performance to stakeholders who may find residual plots less intuitive

Theoretical Basis

The prediction error plot is based on the relationship between actual and predicted values. For a perfect model:

y^i=yii

which corresponds to the identity line with slope 1 and intercept 0.

The best-fit line through the scatter of (yi,y^i) pairs is computed via ordinary least squares:

y^=β0+β1y

where:

β1=i=1n(yiy¯)(y^iy^¯)i=1n(yiy¯)2

β0=y^¯β1y¯

For an ideal model, β1=1 and β0=0. Deviations from these values quantify the systematic bias.

The goodness-of-fit is summarized by the R2 score:

R2=1i=1n(yiy^i)2i=1n(yiy¯)2

Using shared axis limits (so both axes span the same range) creates a square plot where the identity line is a true 45-degree diagonal, making it visually straightforward to assess the magnitude and direction of errors.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment