Implementation:DistrictDataLabs Yellowbrick ResidualsPlot Visualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Regression, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for visualizing regression residuals provided by the Yellowbrick library.
Description
The ResidualsPlot visualizer plots the residuals (predicted value minus actual value) on the vertical axis against the predicted values on the horizontal axis for a fitted regression model. It supports separate coloring and opacity for training and test data splits, allowing direct visual comparison of residual behavior. A horizontal zero line is drawn to serve as the baseline for residual evaluation.
Optionally, a histogram of the residuals can be appended to the right side of the scatter plot to inspect the distribution of errors. This histogram can display either raw frequency counts or a probability density estimate. As an alternative to the histogram, a Q-Q (quantile-quantile) plot can be shown instead, comparing the residual quantiles against a standard normal distribution. The histogram and Q-Q plot are mutually exclusive and cannot be shown simultaneously.
The visualizer wraps a Scikit-Learn regressor and extends RegressionScoreVisualizer. Its primary entry points are the fit() method (which fits the estimator and draws training residuals) and the score() method (which generates predictions on test data and draws test residuals). Both train and test scores are displayed in the legend.
Usage
Use ResidualsPlot when you need to:
- Visually diagnose whether a linear regression model is appropriate for the data
- Check for heteroscedasticity or non-random patterns in the residuals
- Compare training versus test residual distributions
- Inspect the normality of residuals via a histogram or Q-Q plot
Code Reference
Source Location
- Repository: yellowbrick
- File:
yellowbrick/regressor/residuals.py - Class: Lines 47-401
- Quick Method: Lines 408-556
Signature
class ResidualsPlot(RegressionScoreVisualizer):
def __init__(
self,
estimator,
ax=None,
hist=True,
qqplot=False,
train_color="b",
test_color="g",
line_color=LINE_COLOR,
train_alpha=0.75,
test_alpha=0.75,
is_fitted="auto",
**kwargs
)
Import
from yellowbrick.regressor import ResidualsPlot
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | Scikit-Learn regressor | Yes | A regression estimator instance to wrap. Must be a regressor or a YellowbrickTypeError is raised.
|
| ax | matplotlib Axes | No | The axes to plot on. If None, the current axes are used or created.
|
| hist | bool, str, or None | No | Controls the residuals histogram. True or 'frequency' shows frequency; 'density' shows probability density; False or None disables. Default: True. Requires Matplotlib >= 2.0.2.
|
| qqplot | bool | No | If True, draws a Q-Q plot instead of a histogram. Cannot be True simultaneously with hist. Default: False.
|
| train_color | color | No | Color for training data residual points. Default: 'b' (blue).
|
| test_color | color | No | Color for test data residual points. Default: 'g' (green).
|
| line_color | color | No | Color for the zero error line. Default: dark grey. |
| train_alpha | float | No | Transparency for training data points (0=transparent, 1=opaque). Default: 0.75.
|
| test_alpha | float | No | Transparency for test data points (0=transparent, 1=opaque). Default: 0.75.
|
| is_fitted | bool or str | No | Whether the estimator is already fitted. False means it will be fit during visualizer.fit(); 'auto' checks automatically. Default: 'auto'.
|
Outputs
| Name | Type | Description |
|---|---|---|
| train_score_ | float | The score on the training data. |
| test_score_ | float | The score on the test data. |
| ax | matplotlib Axes | The axes containing the residuals scatter plot with optional histogram or Q-Q plot. |
Usage Examples
Basic Usage
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from yellowbrick.regressor import ResidualsPlot
from yellowbrick.datasets import load_concrete
# Load dataset
X, y = load_concrete()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit the visualizer
viz = ResidualsPlot(Ridge())
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.show()
Quick Method
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from yellowbrick.regressor import residuals_plot
from yellowbrick.datasets import load_concrete
X, y = load_concrete()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
viz = residuals_plot(Ridge(), X_train, y_train, X_test, y_test)