Implementation:DistrictDataLabs Yellowbrick KElbowVisualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for determining the optimal number of clusters using the elbow method, provided by the Yellowbrick library.
Description
KElbowVisualizer is a Yellowbrick visualizer that implements the elbow method for selecting the optimal number of clusters in k-means clustering. It wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans), fits it across a user-specified range of k values, and plots the resulting scores to produce the characteristic elbow curve.
The visualizer supports three scoring metrics: distortion (sum of squared distances to cluster centers), silhouette (mean silhouette coefficient), and calinski_harabasz (ratio of between-cluster to within-cluster dispersion). It can optionally display fit times on a secondary y-axis to help evaluate computational trade-offs. When locate_elbow=True, the visualizer automatically identifies the optimal k using the KneeLocator algorithm and annotates the plot with a dashed vertical line at that point.
The class extends ClusteringScoreVisualizer, inheriting Yellowbrick's standard visualizer API pattern of fit(), draw(), finalize(), and show().
Usage
Use KElbowVisualizer when you need to visually and programmatically determine the best k for k-means clustering. Import and instantiate it with a scikit-learn clusterer, call fit(X) with your feature matrix, and call show() to render the plot. The elbow_value_ and elbow_score_ attributes provide the detected optimal k and its associated score after fitting.
Code Reference
Source Location
- Repository: yellowbrick
- File:
yellowbrick/cluster/elbow.py - Class Definition: Lines 137-466
- Key Methods:
__init__(L256-266),fit(L297-383),draw(L385-412) - Quick Method:
kelbow_visualizer()(L474-577)
Signature
class KElbowVisualizer(ClusteringScoreVisualizer):
def __init__(
self,
estimator,
ax=None,
k=10,
metric="distortion",
distance_metric='euclidean',
timings=True,
locate_elbow=True,
**kwargs
):
Import
from yellowbrick.cluster import KElbowVisualizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | scikit-learn clusterer | Yes | An unfitted clustering estimator (e.g., KMeans or MiniBatchKMeans). Must support set_params(n_clusters=k).
|
| ax | matplotlib Axes | No | The axes to plot the figure on. If None, the current axes are used or generated.
|
| k | int, tuple, or iterable | No | The k values to evaluate. An integer specifies range(2, k+1); a 2-tuple specifies range(k[0], k[1]); an iterable provides explicit k values. Default: 10.
|
| metric | str | No | Scoring metric: "distortion", "silhouette", or "calinski_harabasz". Default: "distortion".
|
| distance_metric | str or callable | No | Distance metric for pairwise distance computation (e.g., "euclidean", "manhattan", "cosine"). Must be valid for sklearn.metrics.pairwise.pairwise_distances. Default: "euclidean".
|
| timings | bool | No | Whether to plot fit time per k on a secondary y-axis. Default: True.
|
| locate_elbow | bool | No | Whether to automatically detect and annotate the elbow point using the KneeLocator algorithm. Default: True.
|
The fit() method accepts:
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Yes | Feature matrix to cluster. |
| y | array-like of shape (n_samples,) | No | Ignored. Present for API consistency. |
Outputs
| Name | Type | Description |
|---|---|---|
| k_scores_ | array of shape (n_k_values,) | The scoring metric value for each tested k. |
| k_timers_ | array of shape (n_k_values,) | The time in seconds to fit the model for each tested k. |
| elbow_value_ | int or None | The optimal k detected by the KneeLocator, or None if no elbow was found.
|
| elbow_score_ | float | The score at the detected elbow point, or 0 if no elbow was found.
|
| k_values_ | list of int | The list of k values that were evaluated. |
Usage Examples
Basic Usage
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer
# Generate synthetic data
X, y = make_blobs(n_samples=1000, n_features=12, centers=6, random_state=42)
# Instantiate the clustering model and visualizer
model = KMeans(random_state=42)
visualizer = KElbowVisualizer(model, k=(2, 12), metric="distortion")
# Fit and show the elbow plot
visualizer.fit(X)
visualizer.show()
# Access the detected optimal k
print("Optimal k:", visualizer.elbow_value_)
print("Elbow score:", visualizer.elbow_score_)
Using Different Metrics
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
model = KMeans(random_state=42)
# Use silhouette score instead of distortion
visualizer = KElbowVisualizer(model, k=10, metric="silhouette", timings=False)
visualizer.fit(X)
visualizer.show()
Quick Method
from sklearn.cluster import KMeans
from yellowbrick.cluster.elbow import kelbow_visualizer
# One-liner: creates, fits, and shows the visualizer
viz = kelbow_visualizer(KMeans(random_state=42), X, k=10, metric="distortion")
Internal Workflow
The fit() method executes the following steps:
- Converts the
kparameter into a list of integer k values (k_values_). - Iterates over each k value, setting the estimator's
n_clustersparameter and fitting it to X. - Records the scoring metric value and fit time for each k.
- If
locate_elbow=True, passes k values and scores to a KneeLocator instance configured with the appropriate curve nature and direction for the chosen metric (convex/decreasing for distortion, concave/increasing for silhouette and Calinski-Harabasz). - Calls
draw()to plot the elbow curve, optionally with timing information on a twin y-axis and a vertical dashed line at the detected elbow.