Implementation:DistrictDataLabs Yellowbrick KElbowVisualizer

Knowledge Sources	Yellowbrick Yellowbrick Docs
Domains	Machine_Learning, Clustering, Visualization
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for determining the optimal number of clusters using the elbow method, provided by the Yellowbrick library.

Description

KElbowVisualizer is a Yellowbrick visualizer that implements the elbow method for selecting the optimal number of clusters in k-means clustering. It wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans), fits it across a user-specified range of k values, and plots the resulting scores to produce the characteristic elbow curve.

The visualizer supports three scoring metrics: distortion (sum of squared distances to cluster centers), silhouette (mean silhouette coefficient), and calinski_harabasz (ratio of between-cluster to within-cluster dispersion). It can optionally display fit times on a secondary y-axis to help evaluate computational trade-offs. When locate_elbow=True, the visualizer automatically identifies the optimal k using the KneeLocator algorithm and annotates the plot with a dashed vertical line at that point.

The class extends ClusteringScoreVisualizer, inheriting Yellowbrick's standard visualizer API pattern of fit(), draw(), finalize(), and show().

Usage

Use KElbowVisualizer when you need to visually and programmatically determine the best k for k-means clustering. Import and instantiate it with a scikit-learn clusterer, call fit(X) with your feature matrix, and call show() to render the plot. The elbow_value_ and elbow_score_ attributes provide the detected optimal k and its associated score after fitting.

Code Reference

Source Location

Repository: yellowbrick
File: yellowbrick/cluster/elbow.py
Class Definition: Lines 137-466
Key Methods: __init__ (L256-266), fit (L297-383), draw (L385-412)
Quick Method: kelbow_visualizer() (L474-577)

Signature

class KElbowVisualizer(ClusteringScoreVisualizer):

    def __init__(
        self,
        estimator,
        ax=None,
        k=10,
        metric="distortion",
        distance_metric='euclidean',
        timings=True,
        locate_elbow=True,
        **kwargs
    ):

Import

from yellowbrick.cluster import KElbowVisualizer

I/O Contract

Inputs

Name	Type	Required	Description
estimator	scikit-learn clusterer	Yes	An unfitted clustering estimator (e.g., `KMeans` or `MiniBatchKMeans`). Must support `set_params(n_clusters=k)`.
ax	matplotlib Axes	No	The axes to plot the figure on. If `None`, the current axes are used or generated.
k	int, tuple, or iterable	No	The k values to evaluate. An integer specifies `range(2, k+1)`; a 2-tuple specifies `range(k[0], k[1])`; an iterable provides explicit k values. Default: `10`.
metric	str	No	Scoring metric: `"distortion"`, `"silhouette"`, or `"calinski_harabasz"`. Default: `"distortion"`.
distance_metric	str or callable	No	Distance metric for pairwise distance computation (e.g., `"euclidean"`, `"manhattan"`, `"cosine"`). Must be valid for `sklearn.metrics.pairwise.pairwise_distances`. Default: `"euclidean"`.
timings	bool	No	Whether to plot fit time per k on a secondary y-axis. Default: `True`.
locate_elbow	bool	No	Whether to automatically detect and annotate the elbow point using the KneeLocator algorithm. Default: `True`.

The fit() method accepts:

Name	Type	Required	Description
X	array-like of shape (n_samples, n_features)	Yes	Feature matrix to cluster.
y	array-like of shape (n_samples,)	No	Ignored. Present for API consistency.

Outputs

Name	Type	Description
k_scores_	array of shape (n_k_values,)	The scoring metric value for each tested k.
k_timers_	array of shape (n_k_values,)	The time in seconds to fit the model for each tested k.
elbow_value_	int or None	The optimal k detected by the KneeLocator, or `None` if no elbow was found.
elbow_score_	float	The score at the detected elbow point, or `0` if no elbow was found.
k_values_	list of int	The list of k values that were evaluated.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer

# Generate synthetic data
X, y = make_blobs(n_samples=1000, n_features=12, centers=6, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(random_state=42)
visualizer = KElbowVisualizer(model, k=(2, 12), metric="distortion")

# Fit and show the elbow plot
visualizer.fit(X)
visualizer.show()

# Access the detected optimal k
print("Optimal k:", visualizer.elbow_value_)
print("Elbow score:", visualizer.elbow_score_)

Using Different Metrics

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans(random_state=42)

# Use silhouette score instead of distortion
visualizer = KElbowVisualizer(model, k=10, metric="silhouette", timings=False)
visualizer.fit(X)
visualizer.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.elbow import kelbow_visualizer

# One-liner: creates, fits, and shows the visualizer
viz = kelbow_visualizer(KMeans(random_state=42), X, k=10, metric="distortion")

Internal Workflow

The fit() method executes the following steps:

Converts the k parameter into a list of integer k values (k_values_).
Iterates over each k value, setting the estimator's n_clusters parameter and fitting it to X.
Records the scoring metric value and fit time for each k.
If locate_elbow=True, passes k values and scores to a KneeLocator instance configured with the appropriate curve nature and direction for the chosen metric (convex/decreasing for distortion, concave/increasing for silhouette and Calinski-Harabasz).
Calls draw() to plot the elbow curve, optionally with timing information on a twin y-axis and a vertical dashed line at the detected elbow.

Related Pages

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment