Implementation:DistrictDataLabs Yellowbrick SilhouetteVisualizer
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for evaluating cluster quality through per-sample silhouette coefficient visualization, provided by the Yellowbrick library.
Description
SilhouetteVisualizer is a Yellowbrick visualizer that displays the silhouette coefficient for each sample on a per-cluster basis, creating a diagnostic plot that reveals the density and separation of clusters. Each cluster is rendered as a sorted horizontal bar chart of individual sample silhouette values, and a vertical dashed red line marks the mean silhouette score across all samples.
The visualizer wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans). If the estimator is not already fitted, the visualizer fits it during its own fit() call. It then computes both the per-sample silhouette coefficients (via sklearn.metrics.silhouette_samples) and the overall mean silhouette score (via sklearn.metrics.silhouette_score). The resulting plot enables quick visual assessment of cluster cohesion, separation, and balance.
The class extends ClusteringScoreVisualizer and follows Yellowbrick's standard fit() / draw() / finalize() / show() API pattern.
Usage
Use SilhouetteVisualizer when you want to visually diagnose the quality of a specific clustering configuration. It is commonly used in a loop over multiple values of k to compare silhouette plots side by side and select the best cluster count. Import it, wrap your scikit-learn clusterer, call fit(X), and then show().
Code Reference
Source Location
- Repository: yellowbrick
- File:
yellowbrick/cluster/silhouette.py - Class Definition: Lines 39-259
- Key Methods:
__init__(L115-126),fit(L128-153),draw(L155-219) - Quick Method:
silhouette_visualizer()(L267-334)
Signature
class SilhouetteVisualizer(ClusteringScoreVisualizer):
def __init__(self, estimator, ax=None, colors=None, is_fitted="auto", **kwargs):
Import
from yellowbrick.cluster import SilhouetteVisualizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | scikit-learn clusterer | Yes | A centroidal clustering estimator (e.g., KMeans, MiniBatchKMeans). Must have n_clusters, predict(), and labels_.
|
| ax | matplotlib Axes | No | The axes to plot the figure on. If None, the current axes are used or generated.
|
| colors | iterable or str | No | Colors for each cluster group. Can be a list of colors or a colormap name string. If fewer colors than clusters, colors repeat. Default: None (uses "Set1" colormap).
|
| is_fitted | bool or str | No | Whether the estimator is already fitted. "auto" checks automatically; False forces re-fitting; True skips fitting. Default: "auto".
|
The fit() method accepts:
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Yes | Feature matrix to cluster and evaluate. |
| y | array-like of shape (n_samples,) | No | Ignored. Present for API consistency. |
Outputs
| Name | Type | Description |
|---|---|---|
| silhouette_score_ | float | Mean silhouette coefficient across all samples. |
| silhouette_samples_ | array of shape (n_samples,) | Per-sample silhouette coefficient values. |
| n_samples_ | int | Total number of samples in the dataset (X.shape[0]).
|
| n_clusters_ | int | Number of clusters from the estimator's n_clusters attribute.
|
| y_tick_pos_ | array of shape (n_clusters,) | Computed center positions of each cluster on the y-axis for label placement. |
Usage Examples
Basic Usage
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import SilhouetteVisualizer
# Generate synthetic data
X, y = make_blobs(n_samples=500, n_features=5, centers=4, random_state=42)
# Instantiate the clustering model and visualizer
model = KMeans(n_clusters=4, random_state=42)
visualizer = SilhouetteVisualizer(model, colors="yellowbrick")
# Fit the visualizer and show the plot
visualizer.fit(X)
visualizer.show()
# Access the computed scores
print("Mean Silhouette Score:", visualizer.silhouette_score_)
Comparing Multiple k Values
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from yellowbrick.cluster import SilhouetteVisualizer
fig, axes = plt.subplots(2, 2, figsize=(15, 8))
for idx, k in enumerate([2, 3, 4, 5]):
ax = axes[idx // 2, idx % 2]
model = KMeans(n_clusters=k, random_state=42)
visualizer = SilhouetteVisualizer(model, ax=ax)
visualizer.fit(X)
visualizer.finalize()
plt.tight_layout()
plt.show()
Quick Method
from sklearn.cluster import KMeans
from yellowbrick.cluster.silhouette import silhouette_visualizer
# One-liner: creates, fits, and shows the visualizer
viz = silhouette_visualizer(KMeans(n_clusters=4, random_state=42), X)
Internal Workflow
The fit() method executes the following steps:
- Checks whether the wrapped estimator is already fitted (controlled by
is_fitted). If not fitted, callsestimator.fit(X, y). - Records the number of samples (
n_samples_) and clusters (n_clusters_). - Calls
estimator.predict(X)to obtain cluster labels. - Computes the mean silhouette score via
sklearn.metrics.silhouette_score. - Computes per-sample silhouette coefficients via
sklearn.metrics.silhouette_samples. - Calls
draw(labels)which, for each cluster, sorts the silhouette values and renders them as filled horizontal bars usingax.fill_betweenx(). A vertical dashed red line marks the mean silhouette score.