Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick CooksDistance Visualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Regression, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for visualizing Cook's Distance to detect influential outliers in regression data, provided by the Yellowbrick library.

Description

The CooksDistance visualizer computes and displays Cook's Distance for every observation in the dataset using a stem plot. Each vertical stem represents the influence of a single instance on the fitted ordinary least squares (OLS) regression model. A horizontal dashed threshold line at Di=4/n is optionally drawn to flag potentially influential outliers, and the legend reports the percentage of observations exceeding this threshold.

Unlike other Yellowbrick regression visualizers, CooksDistance does not wrap a user-supplied estimator. Instead, it internally uses a sklearn.linear_model.LinearRegression to compute residuals and the mean squared error. The implementation computes leverage as the diagonal of the projection matrix H=X(XTX)1XT using the pseudoinverse of X. Studentized residuals and leverage values are then combined to produce the Cook's Distance for each observation. The associated p-values are derived from the F-distribution.

The visualizer extends the base Visualizer class (not RegressionScoreVisualizer) and its primary entry point is the fit() method, which computes the distances and draws the plot in one step.

Usage

Use CooksDistance when you need to:

  • Identify data points that disproportionately influence regression coefficient estimates
  • Perform outlier screening before training a production regression model
  • Diagnose unexpected model behavior by locating high-influence observations
  • Quantify what percentage of the dataset consists of influential outliers

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/regressor/influence.py
  • Class: Lines 32-216
  • Quick Method: Lines 219-302

Signature

class CooksDistance(Visualizer):
    def __init__(
        self, ax=None, draw_threshold=True, linefmt="C0-", markerfmt=",", **kwargs
    )

Import

from yellowbrick.regressor import CooksDistance

I/O Contract

Inputs

Name Type Required Description
ax matplotlib Axes No The axes to plot on. If None, the current axes are used or created.
draw_threshold bool No If True, draws a horizontal dashed line at Di=4/n and shows the percentage of outliers in the legend. Default: True.
linefmt str No Format string for the vertical lines of the stem plot (color and line style). Default: 'C0-' (solid line, first color cycle).
markerfmt str No Format string for the markers at the top of stem lines. Default: ',' (pixel marker, essentially invisible).

Outputs

Name Type Description
distance_ ndarray, 1D The Cook's Distance value for each observation. Shape: (n_samples,).
p_values_ ndarray, 1D The p-values from the F-test of Cook's Distance distribution. Shape matches distance_.
influence_threshold_ float The influence threshold It=4/n, used as the rule of thumb cutoff.
outlier_percentage_ float Percentage of observations with Cook's Distance above the threshold (range 0.0 to 100.0).
ax matplotlib Axes The axes containing the stem plot with optional threshold line.

Usage Examples

Basic Usage

from yellowbrick.regressor import CooksDistance
from yellowbrick.datasets import load_concrete

# Load dataset
X, y = load_concrete()

# Create and fit the visualizer
viz = CooksDistance()
viz.fit(X, y)
viz.show()

Quick Method

from yellowbrick.regressor import cooks_distance
from yellowbrick.datasets import load_concrete

X, y = load_concrete()

viz = cooks_distance(X, y)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment