Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick KneeLocator

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete utility for automatically detecting the knee (or elbow) point on a curve using the Kneedle algorithm, provided by the Yellowbrick library.

Description

KneeLocator is a Yellowbrick utility class that implements the Kneedle algorithm for knee point detection. It is a port of the kneed package by Kevin Arvai, maintained with permission by the Yellowbrick contributors. The class takes a set of x-y data points representing a curve, and upon initialization, computes the knee point -- the value of x at which the curve exhibits maximum curvature.

The algorithm works by normalizing the input data, computing a difference curve between the normalized values and a diagonal reference line, finding local maxima of this difference curve, and applying a sensitivity-controlled threshold to identify the true knee. It supports all four combinations of curve nature (concave/convex) and direction (increasing/decreasing), making it applicable to a wide variety of elbow and knee curves.

Within Yellowbrick, KneeLocator is used internally by KElbowVisualizer to automatically identify the optimal number of clusters. The KElbowVisualizer.fit() method passes the k values and their corresponding scores to a KneeLocator instance, configuring the curve nature and direction based on the chosen metric (convex/decreasing for distortion, concave/increasing for silhouette and Calinski-Harabasz).

Usage

KneeLocator is primarily used as an internal dependency of KElbowVisualizer. However, it can also be imported and used directly for any knee or elbow detection task on arbitrary curves. Instantiate it with x and y arrays and the appropriate curve parameters; the knee attribute is immediately available after construction.

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/utils/kneed.py
  • Class Definition: Lines 50-260
  • Key Methods: __init__ (L86-160), find_knee (L187-260)

Signature

class KneeLocator(object):

    def __init__(
        self,
        x,
        y,
        S=1.0,
        curve_nature="concave",
        curve_direction="increasing",
        online=False,
    ):

Import

from yellowbrick.utils import KneeLocator

I/O Contract

Inputs

Name Type Required Description
x list or array-like Yes The x-axis values (e.g., k values in cluster analysis). Must be monotonically ordered.
y list or array-like Yes The y-axis values (e.g., scores corresponding to each k value).
S float No Sensitivity parameter controlling how aggressive the knee detection is. Lower values detect subtler knees. Default: 1.0.
curve_nature str No The nature of the curve: "concave" or "convex". Default: "concave".
curve_direction str No The direction of the curve: "increasing" or "decreasing". Default: "increasing".
online bool No If True, may correct earlier knee points as more data is processed. If False, returns the first detected knee. Default: False.

Outputs

Name Type Description
knee int, float, or None The x-value at the detected knee point, or None if no knee was found.
knee_y float or None The original y-value at the detected knee point, or None if no knee was found.
norm_knee float or None The normalized x-value at the knee point.
norm_knee_y float or None The normalized y-value at the knee point.
all_knees set A set of all detected knee x-values (relevant when online=True).
all_norm_knees set A set of all detected normalized knee x-values.
all_knees_y list The y-values at all detected knees.
all_norm_knees_y list The normalized y-values at all detected knees.
x_normalized array The normalized x-axis values (range [0, 1]).
y_normalized array The normalized and transformed y-axis values.
y_difference array The difference curve y_normalized - x_normalized.
Tmx array The computed thresholds for each local maximum of the difference curve.

Usage Examples

Basic Usage (Standalone)

from yellowbrick.utils import KneeLocator

# Example: distortion scores for k = 2 through 10
k_values = [2, 3, 4, 5, 6, 7, 8, 9, 10]
distortions = [890, 650, 420, 310, 280, 260, 250, 245, 242]

# Distortion is a convex, decreasing curve
kl = KneeLocator(
    k_values,
    distortions,
    curve_nature="convex",
    curve_direction="decreasing",
)

print("Optimal k:", kl.knee)         # e.g., 5
print("Score at knee:", kl.knee_y)   # e.g., 310

Internal Usage by KElbowVisualizer

# Inside KElbowVisualizer.fit() (simplified):
from yellowbrick.utils import KneeLocator

# After computing k_values_ and k_scores_:
elbow_locator = KneeLocator(
    self.k_values_,
    self.k_scores_,
    curve_nature="convex",       # for distortion metric
    curve_direction="decreasing",
)

self.elbow_value_ = elbow_locator.knee
self.elbow_score_ = self.k_scores_[self.k_values_.index(self.elbow_value_)]

Silhouette Curve Detection

from yellowbrick.utils import KneeLocator

k_values = [2, 3, 4, 5, 6, 7, 8, 9, 10]
silhouette_scores = [0.35, 0.42, 0.55, 0.61, 0.63, 0.64, 0.64, 0.63, 0.62]

# Silhouette is a concave, increasing curve
kl = KneeLocator(
    k_values,
    silhouette_scores,
    curve_nature="concave",
    curve_direction="increasing",
)

print("Optimal k:", kl.knee)

Internal Workflow

The __init__ method executes the full Kneedle algorithm upon construction:

  1. Smoothing: Fits a spline interpolation to the input (x, y) data using scipy.interpolate.interp1d.
  2. Normalization: Normalizes both x and y to the range [0, 1] via min-max scaling.
  3. Curve Transformation: Transforms the normalized y-values based on curve_nature and curve_direction to convert the problem to a standard concave-increasing form. For convex/decreasing curves (e.g., distortion), this involves computing y_max - y. For convex/increasing curves, both flipping and subtraction are applied.
  4. Difference Curve: Computes y_difference = y_normalized - x_normalized.
  5. Local Extrema: Identifies local maxima and minima of the difference curve using scipy.signal.argrelextrema.
  6. Thresholding: Computes detection thresholds for each local maximum: T_m = y_diff_max - S * mean(delta_x).
  7. Knee Finding: Calls find_knee(), which traverses the difference curve from the first local maximum. When the curve drops below the threshold, the corresponding x-value is identified as the knee. In offline mode, the first knee is returned immediately.
  8. Validation: If no knees are found, issues a YellowbrickWarning and sets knee to None.

Related Pages

Implements Principle

Related Implementations

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment