Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DistrictDataLabs Yellowbrick InterclusterDistance Visualizer

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering, Visualization
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for visualizing the relative positions and sizes of clusters via 2D embedding of cluster centroids, provided by the Yellowbrick library.

Description

InterclusterDistance is a Yellowbrick visualizer that creates an intercluster distance map by embedding high-dimensional cluster centers into a two-dimensional space and rendering each cluster as a circle whose size reflects a scoring metric (by default, cluster membership count). The embedding preserves the relative distances between cluster centers, so the spatial layout of circles in the plot corresponds to the relationships between clusters in the original feature space.

The visualizer supports two embedding algorithms: MDS (Multidimensional Scaling) and t-SNE (t-distributed Stochastic Neighbor Embedding). The only currently supported scoring metric is membership (the number of data points assigned to each cluster). Each cluster is drawn as a scatter point with an area proportional to its score, and a numeric label is placed at its center. An optional size legend displays reference circles at the 25th, 50th, and 75th percentile of scores.

The class extends ClusteringScoreVisualizer and follows Yellowbrick's standard fit() / draw() / finalize() / show() API pattern. It is also aliased as ICDM.

Usage

Use InterclusterDistance after you have chosen a value of k to visualize the resulting cluster structure. It is especially helpful for understanding whether clusters are well-separated and for identifying the relative population sizes of clusters. Import it, wrap your scikit-learn clusterer, call fit(X), and then show().

Code Reference

Source Location

  • Repository: yellowbrick
  • File: yellowbrick/cluster/icdm.py
  • Class Definition: Lines 61-425
  • Key Methods: __init__ (L164-206), fit (L279-299), draw (L301-326)
  • Quick Method: intercluster_distance() (L469-599)

Signature

class InterclusterDistance(ClusteringScoreVisualizer):

    def __init__(
        self,
        estimator,
        ax=None,
        min_size=400,
        max_size=25000,
        embedding="mds",
        scoring="membership",
        legend=True,
        legend_loc="lower left",
        legend_size=1.5,
        random_state=None,
        is_fitted="auto",
        **kwargs
    ):

Import

from yellowbrick.cluster import InterclusterDistance

I/O Contract

Inputs

Name Type Required Description
estimator scikit-learn clusterer Yes A centroidal clustering estimator with cluster_centers_ and labels_ attributes (e.g., KMeans, MiniBatchKMeans).
ax matplotlib Axes No The axes to plot the figure on. If None, the current axes are used or generated.
min_size int No Minimum marker size in points for the smallest cluster. Default: 400.
max_size int No Maximum marker size in points for the largest cluster. Default: 25000.
embedding str No Dimensionality reduction algorithm for embedding cluster centers: "mds" or "tsne". Default: "mds".
scoring str No Scoring metric for cluster sizes: "membership" (count of assigned points). Default: "membership".
legend bool No Whether to draw a size legend showing reference cluster sizes. Default: True.
legend_loc str No Location of the size legend (any valid matplotlib legend location string). Default: "lower left".
legend_size float No Size of the inset legend axes in inches. Default: 1.5.
random_state int or RandomState No Random state for reproducibility of the embedding algorithm. Default: None.
is_fitted bool or str No Whether the estimator is already fitted. "auto" checks automatically. Default: "auto".

The fit() method accepts:

Name Type Required Description
X array-like of shape (n_samples, n_features) Yes Feature matrix to cluster and visualize.
y array-like of shape (n_samples,) No Ignored. Present for API consistency.

Outputs

Name Type Description
cluster_centers_ array of shape (n_clusters, n_features) The cluster centers retrieved from the fitted estimator.
embedded_centers_ array of shape (n_clusters, 2) The 2D positions of cluster centers after embedding.
scores_ array of shape (n_clusters,) The scoring metric values (e.g., membership counts) for each cluster.
fit_time_ Timer The elapsed time for fitting the clustering model and performing the embedding.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import InterclusterDistance

# Generate synthetic data
X, y = make_blobs(n_samples=1000, n_features=12, centers=6, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(n_clusters=6, random_state=42)
visualizer = InterclusterDistance(model)

# Fit and show the intercluster distance map
visualizer.fit(X)
visualizer.show()

Customizing the Visualization

from sklearn.cluster import KMeans
from yellowbrick.cluster import InterclusterDistance

model = KMeans(n_clusters=8, random_state=42)
visualizer = InterclusterDistance(
    model,
    embedding="tsne",
    min_size=500,
    max_size=20000,
    legend_loc="upper right",
    random_state=42,
)
visualizer.fit(X)
visualizer.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.icdm import intercluster_distance

# One-liner: creates, fits, and shows the visualizer
viz = intercluster_distance(KMeans(n_clusters=6, random_state=42), X)

Internal Workflow

The fit() method executes the following steps:

  1. Checks whether the wrapped estimator is already fitted (controlled by is_fitted). If not fitted, calls estimator.fit(X, y) within a timer.
  2. Retrieves the cluster centers from the estimator's cluster_centers_ attribute.
  3. Applies the embedding algorithm (MDS or t-SNE) via fit_transform() on the cluster centers to obtain 2D coordinates (embedded_centers_).
  4. Computes the cluster scores using the specified scoring method (e.g., np.bincount(labels_) for membership).
  5. Calls draw(), which computes marker sizes from scores using prop_to_size(), draws scatter points at the embedded coordinates, and annotates each cluster with its numeric index.

The finalize() method sets the title, configures an origin-centered grid, and optionally draws an inset size legend showing reference circles at the 25th, 50th, and 75th percentile of scores.

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment