Implementation:DistrictDataLabs Yellowbrick KneeLocator
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete utility for automatically detecting the knee (or elbow) point on a curve using the Kneedle algorithm, provided by the Yellowbrick library.
Description
KneeLocator is a Yellowbrick utility class that implements the Kneedle algorithm for knee point detection. It is a port of the kneed package by Kevin Arvai, maintained with permission by the Yellowbrick contributors. The class takes a set of x-y data points representing a curve, and upon initialization, computes the knee point -- the value of x at which the curve exhibits maximum curvature.
The algorithm works by normalizing the input data, computing a difference curve between the normalized values and a diagonal reference line, finding local maxima of this difference curve, and applying a sensitivity-controlled threshold to identify the true knee. It supports all four combinations of curve nature (concave/convex) and direction (increasing/decreasing), making it applicable to a wide variety of elbow and knee curves.
Within Yellowbrick, KneeLocator is used internally by KElbowVisualizer to automatically identify the optimal number of clusters. The KElbowVisualizer.fit() method passes the k values and their corresponding scores to a KneeLocator instance, configuring the curve nature and direction based on the chosen metric (convex/decreasing for distortion, concave/increasing for silhouette and Calinski-Harabasz).
Usage
KneeLocator is primarily used as an internal dependency of KElbowVisualizer. However, it can also be imported and used directly for any knee or elbow detection task on arbitrary curves. Instantiate it with x and y arrays and the appropriate curve parameters; the knee attribute is immediately available after construction.
Code Reference
Source Location
- Repository: yellowbrick
- File:
yellowbrick/utils/kneed.py - Class Definition: Lines 50-260
- Key Methods:
__init__(L86-160),find_knee(L187-260)
Signature
class KneeLocator(object):
def __init__(
self,
x,
y,
S=1.0,
curve_nature="concave",
curve_direction="increasing",
online=False,
):
Import
from yellowbrick.utils import KneeLocator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | list or array-like | Yes | The x-axis values (e.g., k values in cluster analysis). Must be monotonically ordered. |
| y | list or array-like | Yes | The y-axis values (e.g., scores corresponding to each k value). |
| S | float | No | Sensitivity parameter controlling how aggressive the knee detection is. Lower values detect subtler knees. Default: 1.0.
|
| curve_nature | str | No | The nature of the curve: "concave" or "convex". Default: "concave".
|
| curve_direction | str | No | The direction of the curve: "increasing" or "decreasing". Default: "increasing".
|
| online | bool | No | If True, may correct earlier knee points as more data is processed. If False, returns the first detected knee. Default: False.
|
Outputs
| Name | Type | Description |
|---|---|---|
| knee | int, float, or None | The x-value at the detected knee point, or None if no knee was found.
|
| knee_y | float or None | The original y-value at the detected knee point, or None if no knee was found.
|
| norm_knee | float or None | The normalized x-value at the knee point. |
| norm_knee_y | float or None | The normalized y-value at the knee point. |
| all_knees | set | A set of all detected knee x-values (relevant when online=True).
|
| all_norm_knees | set | A set of all detected normalized knee x-values. |
| all_knees_y | list | The y-values at all detected knees. |
| all_norm_knees_y | list | The normalized y-values at all detected knees. |
| x_normalized | array | The normalized x-axis values (range [0, 1]).
|
| y_normalized | array | The normalized and transformed y-axis values. |
| y_difference | array | The difference curve y_normalized - x_normalized.
|
| Tmx | array | The computed thresholds for each local maximum of the difference curve. |
Usage Examples
Basic Usage (Standalone)
from yellowbrick.utils import KneeLocator
# Example: distortion scores for k = 2 through 10
k_values = [2, 3, 4, 5, 6, 7, 8, 9, 10]
distortions = [890, 650, 420, 310, 280, 260, 250, 245, 242]
# Distortion is a convex, decreasing curve
kl = KneeLocator(
k_values,
distortions,
curve_nature="convex",
curve_direction="decreasing",
)
print("Optimal k:", kl.knee) # e.g., 5
print("Score at knee:", kl.knee_y) # e.g., 310
Internal Usage by KElbowVisualizer
# Inside KElbowVisualizer.fit() (simplified):
from yellowbrick.utils import KneeLocator
# After computing k_values_ and k_scores_:
elbow_locator = KneeLocator(
self.k_values_,
self.k_scores_,
curve_nature="convex", # for distortion metric
curve_direction="decreasing",
)
self.elbow_value_ = elbow_locator.knee
self.elbow_score_ = self.k_scores_[self.k_values_.index(self.elbow_value_)]
Silhouette Curve Detection
from yellowbrick.utils import KneeLocator
k_values = [2, 3, 4, 5, 6, 7, 8, 9, 10]
silhouette_scores = [0.35, 0.42, 0.55, 0.61, 0.63, 0.64, 0.64, 0.63, 0.62]
# Silhouette is a concave, increasing curve
kl = KneeLocator(
k_values,
silhouette_scores,
curve_nature="concave",
curve_direction="increasing",
)
print("Optimal k:", kl.knee)
Internal Workflow
The __init__ method executes the full Kneedle algorithm upon construction:
- Smoothing: Fits a spline interpolation to the input
(x, y)data usingscipy.interpolate.interp1d. - Normalization: Normalizes both x and y to the range
[0, 1]via min-max scaling. - Curve Transformation: Transforms the normalized y-values based on
curve_natureandcurve_directionto convert the problem to a standard concave-increasing form. For convex/decreasing curves (e.g., distortion), this involves computingy_max - y. For convex/increasing curves, both flipping and subtraction are applied. - Difference Curve: Computes
y_difference = y_normalized - x_normalized. - Local Extrema: Identifies local maxima and minima of the difference curve using
scipy.signal.argrelextrema. - Thresholding: Computes detection thresholds for each local maximum:
T_m = y_diff_max - S * mean(delta_x). - Knee Finding: Calls
find_knee(), which traverses the difference curve from the first local maximum. When the curve drops below the threshold, the corresponding x-value is identified as the knee. In offline mode, the first knee is returned immediately. - Validation: If no knees are found, issues a
YellowbrickWarningand setskneetoNone.