Implementation:DistrictDataLabs Yellowbrick CooksDistance Visualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Regression, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for visualizing Cook's Distance to detect influential outliers in regression data, provided by the Yellowbrick library.
Description
The CooksDistance visualizer computes and displays Cook's Distance for every observation in the dataset using a stem plot. Each vertical stem represents the influence of a single instance on the fitted ordinary least squares (OLS) regression model. A horizontal dashed threshold line at is optionally drawn to flag potentially influential outliers, and the legend reports the percentage of observations exceeding this threshold.
Unlike other Yellowbrick regression visualizers, CooksDistance does not wrap a user-supplied estimator. Instead, it internally uses a sklearn.linear_model.LinearRegression to compute residuals and the mean squared error. The implementation computes leverage as the diagonal of the projection matrix using the pseudoinverse of . Studentized residuals and leverage values are then combined to produce the Cook's Distance for each observation. The associated p-values are derived from the F-distribution.
The visualizer extends the base Visualizer class (not RegressionScoreVisualizer) and its primary entry point is the fit() method, which computes the distances and draws the plot in one step.
Usage
Use CooksDistance when you need to:
- Identify data points that disproportionately influence regression coefficient estimates
- Perform outlier screening before training a production regression model
- Diagnose unexpected model behavior by locating high-influence observations
- Quantify what percentage of the dataset consists of influential outliers
Code Reference
Source Location
- Repository: yellowbrick
- File:
yellowbrick/regressor/influence.py - Class: Lines 32-216
- Quick Method: Lines 219-302
Signature
class CooksDistance(Visualizer):
def __init__(
self, ax=None, draw_threshold=True, linefmt="C0-", markerfmt=",", **kwargs
)
Import
from yellowbrick.regressor import CooksDistance
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ax | matplotlib Axes | No | The axes to plot on. If None, the current axes are used or created.
|
| draw_threshold | bool | No | If True, draws a horizontal dashed line at and shows the percentage of outliers in the legend. Default: True.
|
| linefmt | str | No | Format string for the vertical lines of the stem plot (color and line style). Default: 'C0-' (solid line, first color cycle).
|
| markerfmt | str | No | Format string for the markers at the top of stem lines. Default: ',' (pixel marker, essentially invisible).
|
Outputs
| Name | Type | Description |
|---|---|---|
| distance_ | ndarray, 1D | The Cook's Distance value for each observation. Shape: (n_samples,).
|
| p_values_ | ndarray, 1D | The p-values from the F-test of Cook's Distance distribution. Shape matches distance_.
|
| influence_threshold_ | float | The influence threshold , used as the rule of thumb cutoff. |
| outlier_percentage_ | float | Percentage of observations with Cook's Distance above the threshold (range 0.0 to 100.0). |
| ax | matplotlib Axes | The axes containing the stem plot with optional threshold line. |
Usage Examples
Basic Usage
from yellowbrick.regressor import CooksDistance
from yellowbrick.datasets import load_concrete
# Load dataset
X, y = load_concrete()
# Create and fit the visualizer
viz = CooksDistance()
viz.fit(X, y)
viz.show()
Quick Method
from yellowbrick.regressor import cooks_distance
from yellowbrick.datasets import load_concrete
X, y = load_concrete()
viz = cooks_distance(X, y)