Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick KElbowVisualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for determining the optimal number of clusters using the elbow method, provided by the Yellowbrick library.

Description

KElbowVisualizer is a Yellowbrick visualizer that implements the elbow method for selecting the optimal number of clusters in k-means clustering. It wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans), fits it across a user-specified range of k values, and plots the resulting scores to produce the characteristic elbow curve.

The visualizer supports three scoring metrics: distortion (sum of squared distances to cluster centers), silhouette (mean silhouette coefficient), and calinski_harabasz (ratio of between-cluster to within-cluster dispersion). It can optionally display fit times on a secondary y-axis to help evaluate computational trade-offs. When locate_elbow=True, the visualizer automatically identifies the optimal k using the KneeLocator algorithm and annotates the plot with a dashed vertical line at that point.

The class extends ClusteringScoreVisualizer, inheriting Yellowbrick's standard visualizer API pattern of fit(), draw(), finalize(), and show().

Usage

Use KElbowVisualizer when you need to visually and programmatically determine the best k for k-means clustering. Import and instantiate it with a scikit-learn clusterer, call fit(X) with your feature matrix, and call show() to render the plot. The elbow_value_ and elbow_score_ attributes provide the detected optimal k and its associated score after fitting.

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/cluster/elbow.py
  • Class Definition: Lines 137-466
  • Key Methods: __init__ (L256-266), fit (L297-383), draw (L385-412)
  • Quick Method: kelbow_visualizer() (L474-577)

Signature

class KElbowVisualizer(ClusteringScoreVisualizer):

    def __init__(
        self,
        estimator,
        ax=None,
        k=10,
        metric="distortion",
        distance_metric='euclidean',
        timings=True,
        locate_elbow=True,
        **kwargs
    ):

Import

from yellowbrick.cluster import KElbowVisualizer

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn clusterer Yes An unfitted clustering estimator (e.g., KMeans or MiniBatchKMeans). Must support set_params(n_clusters=k).
ax matplotlib Axes No The axes to plot the figure on. If None, the current axes are used or generated.
k int, tuple, or iterable No The k values to evaluate. An integer specifies range(2, k+1); a 2-tuple specifies range(k[0], k[1]); an iterable provides explicit k values. Default: 10.
metric str No Scoring metric: "distortion", "silhouette", or "calinski_harabasz". Default: "distortion".
distance_metric str or callable No Distance metric for pairwise distance computation (e.g., "euclidean", "manhattan", "cosine"). Must be valid for sklearn.metrics.pairwise.pairwise_distances. Default: "euclidean".
timings bool No Whether to plot fit time per k on a secondary y-axis. Default: True.
locate_elbow bool No Whether to automatically detect and annotate the elbow point using the KneeLocator algorithm. Default: True.

The fit() method accepts:

Name Type Required Description
X array-like of shape (n_samples, n_features) Yes Feature matrix to cluster.
y array-like of shape (n_samples,) No Ignored. Present for API consistency.

Outputs

Name Type Description
k_scores_ array of shape (n_k_values,) The scoring metric value for each tested k.
k_timers_ array of shape (n_k_values,) The time in seconds to fit the model for each tested k.
elbow_value_ int or None The optimal k detected by the KneeLocator, or None if no elbow was found.
elbow_score_ float The score at the detected elbow point, or 0 if no elbow was found.
k_values_ list of int The list of k values that were evaluated.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer

# Generate synthetic data
X, y = make_blobs(n_samples=1000, n_features=12, centers=6, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(random_state=42)
visualizer = KElbowVisualizer(model, k=(2, 12), metric="distortion")

# Fit and show the elbow plot
visualizer.fit(X)
visualizer.show()

# Access the detected optimal k
print("Optimal k:", visualizer.elbow_value_)
print("Elbow score:", visualizer.elbow_score_)

Using Different Metrics

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans(random_state=42)

# Use silhouette score instead of distortion
visualizer = KElbowVisualizer(model, k=10, metric="silhouette", timings=False)
visualizer.fit(X)
visualizer.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.elbow import kelbow_visualizer

# One-liner: creates, fits, and shows the visualizer
viz = kelbow_visualizer(KMeans(random_state=42), X, k=10, metric="distortion")

Internal Workflow

The fit() method executes the following steps:

  1. Converts the k parameter into a list of integer k values (k_values_).
  2. Iterates over each k value, setting the estimator's n_clusters parameter and fitting it to X.
  3. Records the scoring metric value and fit time for each k.
  4. If locate_elbow=True, passes k values and scores to a KneeLocator instance configured with the appropriate curve nature and direction for the chosen metric (convex/decreasing for distortion, concave/increasing for silhouette and Calinski-Harabasz).
  5. Calls draw() to plot the elbow curve, optionally with timing information on a twin y-axis and a vertical dashed line at the detected elbow.

Related Pages

Implements Principle

Related Implementations

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment