Implementation:DistrictDataLabs Yellowbrick InterclusterDistance Visualizer

Knowledge Sources	Yellowbrick Yellowbrick Docs
Domains	Machine_Learning, Clustering, Visualization
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for visualizing the relative positions and sizes of clusters via 2D embedding of cluster centroids, provided by the Yellowbrick library.

Description

InterclusterDistance is a Yellowbrick visualizer that creates an intercluster distance map by embedding high-dimensional cluster centers into a two-dimensional space and rendering each cluster as a circle whose size reflects a scoring metric (by default, cluster membership count). The embedding preserves the relative distances between cluster centers, so the spatial layout of circles in the plot corresponds to the relationships between clusters in the original feature space.

The visualizer supports two embedding algorithms: MDS (Multidimensional Scaling) and t-SNE (t-distributed Stochastic Neighbor Embedding). The only currently supported scoring metric is membership (the number of data points assigned to each cluster). Each cluster is drawn as a scatter point with an area proportional to its score, and a numeric label is placed at its center. An optional size legend displays reference circles at the 25th, 50th, and 75th percentile of scores.

The class extends ClusteringScoreVisualizer and follows Yellowbrick's standard fit() / draw() / finalize() / show() API pattern. It is also aliased as ICDM.

Usage

Use InterclusterDistance after you have chosen a value of k to visualize the resulting cluster structure. It is especially helpful for understanding whether clusters are well-separated and for identifying the relative population sizes of clusters. Import it, wrap your scikit-learn clusterer, call fit(X), and then show().

Code Reference

Source Location

Repository: yellowbrick
File: yellowbrick/cluster/icdm.py
Class Definition: Lines 61-425
Key Methods: __init__ (L164-206), fit (L279-299), draw (L301-326)
Quick Method: intercluster_distance() (L469-599)

Signature

class InterclusterDistance(ClusteringScoreVisualizer):

    def __init__(
        self,
        estimator,
        ax=None,
        min_size=400,
        max_size=25000,
        embedding="mds",
        scoring="membership",
        legend=True,
        legend_loc="lower left",
        legend_size=1.5,
        random_state=None,
        is_fitted="auto",
        **kwargs
    ):

Import

from yellowbrick.cluster import InterclusterDistance

I/O Contract

Inputs

Name	Type	Required	Description
estimator	scikit-learn clusterer	Yes	A centroidal clustering estimator with `cluster_centers_` and `labels_` attributes (e.g., `KMeans`, `MiniBatchKMeans`).
ax	matplotlib Axes	No	The axes to plot the figure on. If `None`, the current axes are used or generated.
min_size	int	No	Minimum marker size in points for the smallest cluster. Default: `400`.
max_size	int	No	Maximum marker size in points for the largest cluster. Default: `25000`.
embedding	str	No	Dimensionality reduction algorithm for embedding cluster centers: `"mds"` or `"tsne"`. Default: `"mds"`.
scoring	str	No	Scoring metric for cluster sizes: `"membership"` (count of assigned points). Default: `"membership"`.
legend	bool	No	Whether to draw a size legend showing reference cluster sizes. Default: `True`.
legend_loc	str	No	Location of the size legend (any valid matplotlib legend location string). Default: `"lower left"`.
legend_size	float	No	Size of the inset legend axes in inches. Default: `1.5`.
random_state	int or RandomState	No	Random state for reproducibility of the embedding algorithm. Default: `None`.
is_fitted	bool or str	No	Whether the estimator is already fitted. `"auto"` checks automatically. Default: `"auto"`.

The fit() method accepts:

Name	Type	Required	Description
X	array-like of shape (n_samples, n_features)	Yes	Feature matrix to cluster and visualize.
y	array-like of shape (n_samples,)	No	Ignored. Present for API consistency.

Outputs

Name	Type	Description
cluster_centers_	array of shape (n_clusters, n_features)	The cluster centers retrieved from the fitted estimator.
embedded_centers_	array of shape (n_clusters, 2)	The 2D positions of cluster centers after embedding.
scores_	array of shape (n_clusters,)	The scoring metric values (e.g., membership counts) for each cluster.
fit_time_	Timer	The elapsed time for fitting the clustering model and performing the embedding.

Usage Examples

Basic Usage

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import InterclusterDistance

# Generate synthetic data
X, y = make_blobs(n_samples=1000, n_features=12, centers=6, random_state=42)

# Instantiate the clustering model and visualizer
model = KMeans(n_clusters=6, random_state=42)
visualizer = InterclusterDistance(model)

# Fit and show the intercluster distance map
visualizer.fit(X)
visualizer.show()

Customizing the Visualization

from sklearn.cluster import KMeans
from yellowbrick.cluster import InterclusterDistance

model = KMeans(n_clusters=8, random_state=42)
visualizer = InterclusterDistance(
    model,
    embedding="tsne",
    min_size=500,
    max_size=20000,
    legend_loc="upper right",
    random_state=42,
)
visualizer.fit(X)
visualizer.show()

Quick Method

from sklearn.cluster import KMeans
from yellowbrick.cluster.icdm import intercluster_distance

# One-liner: creates, fits, and shows the visualizer
viz = intercluster_distance(KMeans(n_clusters=6, random_state=42), X)

Internal Workflow

The fit() method executes the following steps:

Checks whether the wrapped estimator is already fitted (controlled by is_fitted). If not fitted, calls estimator.fit(X, y) within a timer.
Retrieves the cluster centers from the estimator's cluster_centers_ attribute.
Applies the embedding algorithm (MDS or t-SNE) via fit_transform() on the cluster centers to obtain 2D coordinates (embedded_centers_).
Computes the cluster scores using the specified scoring method (e.g., np.bincount(labels_) for membership).
Calls draw(), which computes marker sizes from scores using prop_to_size(), draws scatter points at the embedded coordinates, and annotates each cluster with its numeric index.

The finalize() method sets the title, configures an origin-centered grid, and optionally draws an inset size legend showing reference circles at the 25th, 50th, and 75th percentile of scores.

Related Pages

Implements Principle

Principle:DistrictDataLabs_Yellowbrick_Intercluster_Distance_Mapping

Requires Environment

Environment:DistrictDataLabs_Yellowbrick_Python_Scikit_Learn_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment