Implementation:DistrictDataLabs Yellowbrick SilhouetteVisualizer

Knowledge Sources	Yellowbrick Yellowbrick Docs
Domains	Machine_Learning, Clustering, Visualization
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for evaluating cluster quality through per-sample silhouette coefficient visualization, provided by the Yellowbrick library.

Description

SilhouetteVisualizer is a Yellowbrick visualizer that displays the silhouette coefficient for each sample on a per-cluster basis, creating a diagnostic plot that reveals the density and separation of clusters. Each cluster is rendered as a sorted horizontal bar chart of individual sample silhouette values, and a vertical dashed red line marks the mean silhouette score across all samples.

The visualizer wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans). If the estimator is not already fitted, the visualizer fits it during its own fit() call. It then computes both the per-sample silhouette coefficients (via sklearn.metrics.silhouette_samples) and the overall mean silhouette score (via sklearn.metrics.silhouette_score). The resulting plot enables quick visual assessment of cluster cohesion, separation, and balance.

The class extends ClusteringScoreVisualizer and follows Yellowbrick's standard fit() / draw() / finalize() / show() API pattern.

Usage

Use SilhouetteVisualizer when you want to visually diagnose the quality of a specific clustering configuration. It is commonly used in a loop over multiple values of k to compare silhouette plots side by side and select the best cluster count. Import it, wrap your scikit-learn clusterer, call fit(X), and then show().

Code Reference

Source Location

Repository: yellowbrick
File: yellowbrick/cluster/silhouette.py
Class Definition: Lines 39-259
Key Methods: __init__ (L115-126), fit (L128-153), draw (L155-219)
Quick Method: silhouette_visualizer() (L267-334)

Signature

class SilhouetteVisualizer(ClusteringScoreVisualizer):

    def __init__(self, estimator, ax=None, colors=None, is_fitted="auto", **kwargs):

Import

from yellowbrick.cluster import SilhouetteVisualizer

I/O Contract

Inputs

Name	Type	Required	Description
estimator	scikit-learn clusterer	Yes	A centroidal clustering estimator (e.g., `KMeans`, `MiniBatchKMeans`). Must have `n_clusters`, `predict()`, and `labels_`.
ax	matplotlib Axes	No	The axes to plot the figure on. If `None`, the current axes are used or generated.
colors	iterable or str	No	Colors for each cluster group. Can be a list of colors or a colormap name string. If fewer colors than clusters, colors repeat. Default: `None` (uses `"Set1"` colormap).
is_fitted	bool or str	No	Whether the estimator is already fitted. `"auto"` checks automatically; `False` forces re-fitting; `True` skips fitting. Default: `"auto"`.

The fit() method accepts:

Name	Type	Required	Description
X	array-like of shape (n_samples, n_features)	Yes	Feature matrix to cluster and evaluate.
y	array-like of shape (n_samples,)	No	Ignored. Present for API consistency.

Outputs

Name	Type	Description
silhouette_score_	float	Mean silhouette coefficient across all samples.
silhouette_samples_	array of shape (n_samples,)	Per-sample silhouette coefficient values.
n_samples_	int	Total number of samples in the dataset (`X.shape[0]`).
n_clusters_	int	Number of clusters from the estimator's `n_clusters` attribute.
y_tick_pos_	array of shape (n_clusters,)	Computed center positions of each cluster on the y-axis for label placement.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import SilhouetteVisualizer

# Generate synthetic data
X, y = make_blobs(n_samples=500, n_features=5, centers=4, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(n_clusters=4, random_state=42)
visualizer = SilhouetteVisualizer(model, colors="yellowbrick")

# Fit the visualizer and show the plot
visualizer.fit(X)
visualizer.show()

# Access the computed scores
print("Mean Silhouette Score:", visualizer.silhouette_score_)

Comparing Multiple k Values

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from yellowbrick.cluster import SilhouetteVisualizer

fig, axes = plt.subplots(2, 2, figsize=(15, 8))
for idx, k in enumerate([2, 3, 4, 5]):
    ax = axes[idx // 2, idx % 2]
    model = KMeans(n_clusters=k, random_state=42)
    visualizer = SilhouetteVisualizer(model, ax=ax)
    visualizer.fit(X)
    visualizer.finalize()
plt.tight_layout()
plt.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.silhouette import silhouette_visualizer

# One-liner: creates, fits, and shows the visualizer
viz = silhouette_visualizer(KMeans(n_clusters=4, random_state=42), X)

Internal Workflow

The fit() method executes the following steps:

Checks whether the wrapped estimator is already fitted (controlled by is_fitted). If not fitted, calls estimator.fit(X, y).
Records the number of samples (n_samples_) and clusters (n_clusters_).
Calls estimator.predict(X) to obtain cluster labels.
Computes the mean silhouette score via sklearn.metrics.silhouette_score.
Computes per-sample silhouette coefficients via sklearn.metrics.silhouette_samples.
Calls draw(labels) which, for each cluster, sorts the silhouette values and renders them as filled horizontal bars using ax.fill_betweenx(). A vertical dashed red line marks the mean silhouette score.

Related Pages

Implements Principle

Principle:DistrictDataLabs_Yellowbrick_Silhouette_Analysis

Requires Environment

Environment:DistrictDataLabs_Yellowbrick_Python_Scikit_Learn_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment