Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick SilhouetteVisualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for evaluating cluster quality through per-sample silhouette coefficient visualization, provided by the Yellowbrick library.

Description

SilhouetteVisualizer is a Yellowbrick visualizer that displays the silhouette coefficient for each sample on a per-cluster basis, creating a diagnostic plot that reveals the density and separation of clusters. Each cluster is rendered as a sorted horizontal bar chart of individual sample silhouette values, and a vertical dashed red line marks the mean silhouette score across all samples.

The visualizer wraps a scikit-learn clustering estimator (typically KMeans or MiniBatchKMeans). If the estimator is not already fitted, the visualizer fits it during its own fit() call. It then computes both the per-sample silhouette coefficients (via sklearn.metrics.silhouette_samples) and the overall mean silhouette score (via sklearn.metrics.silhouette_score). The resulting plot enables quick visual assessment of cluster cohesion, separation, and balance.

The class extends ClusteringScoreVisualizer and follows Yellowbrick's standard fit() / draw() / finalize() / show() API pattern.

Usage

Use SilhouetteVisualizer when you want to visually diagnose the quality of a specific clustering configuration. It is commonly used in a loop over multiple values of k to compare silhouette plots side by side and select the best cluster count. Import it, wrap your scikit-learn clusterer, call fit(X), and then show().

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/cluster/silhouette.py
  • Class Definition: Lines 39-259
  • Key Methods: __init__ (L115-126), fit (L128-153), draw (L155-219)
  • Quick Method: silhouette_visualizer() (L267-334)

Signature

class SilhouetteVisualizer(ClusteringScoreVisualizer):

    def __init__(self, estimator, ax=None, colors=None, is_fitted="auto", **kwargs):

Import

from yellowbrick.cluster import SilhouetteVisualizer

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn clusterer Yes A centroidal clustering estimator (e.g., KMeans, MiniBatchKMeans). Must have n_clusters, predict(), and labels_.
ax matplotlib Axes No The axes to plot the figure on. If None, the current axes are used or generated.
colors iterable or str No Colors for each cluster group. Can be a list of colors or a colormap name string. If fewer colors than clusters, colors repeat. Default: None (uses "Set1" colormap).
is_fitted bool or str No Whether the estimator is already fitted. "auto" checks automatically; False forces re-fitting; True skips fitting. Default: "auto".

The fit() method accepts:

Name Type Required Description
X array-like of shape (n_samples, n_features) Yes Feature matrix to cluster and evaluate.
y array-like of shape (n_samples,) No Ignored. Present for API consistency.

Outputs

Name Type Description
silhouette_score_ float Mean silhouette coefficient across all samples.
silhouette_samples_ array of shape (n_samples,) Per-sample silhouette coefficient values.
n_samples_ int Total number of samples in the dataset (X.shape[0]).
n_clusters_ int Number of clusters from the estimator's n_clusters attribute.
y_tick_pos_ array of shape (n_clusters,) Computed center positions of each cluster on the y-axis for label placement.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import SilhouetteVisualizer

# Generate synthetic data
X, y = make_blobs(n_samples=500, n_features=5, centers=4, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(n_clusters=4, random_state=42)
visualizer = SilhouetteVisualizer(model, colors="yellowbrick")

# Fit the visualizer and show the plot
visualizer.fit(X)
visualizer.show()

# Access the computed scores
print("Mean Silhouette Score:", visualizer.silhouette_score_)

Comparing Multiple k Values

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from yellowbrick.cluster import SilhouetteVisualizer

fig, axes = plt.subplots(2, 2, figsize=(15, 8))
for idx, k in enumerate([2, 3, 4, 5]):
    ax = axes[idx // 2, idx % 2]
    model = KMeans(n_clusters=k, random_state=42)
    visualizer = SilhouetteVisualizer(model, ax=ax)
    visualizer.fit(X)
    visualizer.finalize()
plt.tight_layout()
plt.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.silhouette import silhouette_visualizer

# One-liner: creates, fits, and shows the visualizer
viz = silhouette_visualizer(KMeans(n_clusters=4, random_state=42), X)

Internal Workflow

The fit() method executes the following steps:

  1. Checks whether the wrapped estimator is already fitted (controlled by is_fitted). If not fitted, calls estimator.fit(X, y).
  2. Records the number of samples (n_samples_) and clusters (n_clusters_).
  3. Calls estimator.predict(X) to obtain cluster labels.
  4. Computes the mean silhouette score via sklearn.metrics.silhouette_score.
  5. Computes per-sample silhouette coefficients via sklearn.metrics.silhouette_samples.
  6. Calls draw(labels) which, for each cluster, sorts the silhouette values and renders them as filled horizontal bars using ax.fill_betweenx(). A vertical dashed red line marks the mean silhouette score.

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment