Principle:Scikit learn Scikit learn Cluster Evaluation

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Model Evaluation, Unsupervised Learning
Last Updated	2026-02-08 15:00 GMT

Overview

Cluster evaluation metrics assess the quality of a clustering result, either by comparing it to known ground-truth labels (external metrics) or by measuring intrinsic properties of the clusters (internal metrics).

Description

Evaluating clustering quality is challenging because, unlike supervised learning, there is often no definitive ground truth. Cluster evaluation metrics address this by providing quantitative measures of clustering quality from two perspectives: external evaluation (when reference labels exist) and internal evaluation (when they do not). These metrics solve the problems of selecting the best clustering algorithm, choosing the optimal number of clusters, and comparing different clustering results. They are essential tools in the unsupervised learning evaluation pipeline.

Usage

Use external (supervised) metrics when ground-truth labels are available, such as in benchmarking or when evaluating a clustering method against a known partition. Use Adjusted Rand Index (ARI) for a general-purpose comparison that is corrected for chance. Use Normalized Mutual Information (NMI) when you want an information-theoretic measure of agreement. Use internal (unsupervised) metrics when no ground truth exists. Use the Silhouette Score for a general assessment of cluster cohesion and separation. Use the Calinski-Harabasz Index for a variance-ratio measure that is fast to compute. Use the Davies-Bouldin Index when you want a metric that does not require computing pairwise distances.

Theoretical Basis

External (Supervised) Metrics

Adjusted Rand Index (ARI): Measures the similarity between two clusterings, corrected for chance:

$ARI = \frac{RI - E [RI]}{\max (RI) - E [RI]}$

where the Rand Index (RI) counts the fraction of pairs that are either in the same cluster in both partitions or in different clusters in both partitions. ARI ranges from -1 to 1, with 0 indicating random labeling and 1 indicating perfect agreement.

Normalized Mutual Information (NMI):

$NMI (U, V) = \frac{2 \cdot I (U; V)}{H (U) + H (V)}$

where $I (U; V)$ is the mutual information between clusterings $U$ and $V$ , and $H$ denotes entropy. NMI ranges from 0 (independent) to 1 (perfect correlation).

Homogeneity and Completeness:

Homogeneity: Each cluster contains only members of a single class.
Completeness: All members of a given class are assigned to the same cluster.
V-measure: The harmonic mean of homogeneity and completeness.

Internal (Unsupervised) Metrics

Silhouette Score: For each sample $i$ :

$s (i) = \frac{b (i) - a (i)}{\max (a (i), b (i))}$

where $a (i)$ is the mean intra-cluster distance and $b (i)$ is the mean nearest-cluster distance. The overall score is the mean over all samples, ranging from -1 (poor clustering) to 1 (dense, well-separated clusters).

Calinski-Harabasz Index (Variance Ratio Criterion):

$CH = \frac{tr (B_{k})}{tr (W_{k})} \cdot \frac{n - k}{k - 1}$

where $B_{k}$ is the between-cluster dispersion matrix, $W_{k}$ is the within-cluster dispersion matrix, $n$ is the number of samples, and $k$ is the number of clusters. Higher values indicate better-defined clusters.

Davies-Bouldin Index:

$DB = \frac{1}{k} \sum_{i = 1}^{k} \max_{j \neq i} \frac{s_{i} + s_{j}}{d_{i j}}$

where $s_{i}$ is the average distance within cluster $i$ and $d_{i j}$ is the distance between cluster centroids. Lower values indicate better clustering.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment