Principle:Scikit learn Scikit learn Cluster Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Model Evaluation, Unsupervised Learning |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Cluster evaluation metrics assess the quality of a clustering result, either by comparing it to known ground-truth labels (external metrics) or by measuring intrinsic properties of the clusters (internal metrics).
Description
Evaluating clustering quality is challenging because, unlike supervised learning, there is often no definitive ground truth. Cluster evaluation metrics address this by providing quantitative measures of clustering quality from two perspectives: external evaluation (when reference labels exist) and internal evaluation (when they do not). These metrics solve the problems of selecting the best clustering algorithm, choosing the optimal number of clusters, and comparing different clustering results. They are essential tools in the unsupervised learning evaluation pipeline.
Usage
Use external (supervised) metrics when ground-truth labels are available, such as in benchmarking or when evaluating a clustering method against a known partition. Use Adjusted Rand Index (ARI) for a general-purpose comparison that is corrected for chance. Use Normalized Mutual Information (NMI) when you want an information-theoretic measure of agreement. Use internal (unsupervised) metrics when no ground truth exists. Use the Silhouette Score for a general assessment of cluster cohesion and separation. Use the Calinski-Harabasz Index for a variance-ratio measure that is fast to compute. Use the Davies-Bouldin Index when you want a metric that does not require computing pairwise distances.
Theoretical Basis
External (Supervised) Metrics
Adjusted Rand Index (ARI): Measures the similarity between two clusterings, corrected for chance:
where the Rand Index (RI) counts the fraction of pairs that are either in the same cluster in both partitions or in different clusters in both partitions. ARI ranges from -1 to 1, with 0 indicating random labeling and 1 indicating perfect agreement.
Normalized Mutual Information (NMI):
where is the mutual information between clusterings and , and denotes entropy. NMI ranges from 0 (independent) to 1 (perfect correlation).
Homogeneity and Completeness:
- Homogeneity: Each cluster contains only members of a single class.
- Completeness: All members of a given class are assigned to the same cluster.
- V-measure: The harmonic mean of homogeneity and completeness.
Internal (Unsupervised) Metrics
Silhouette Score: For each sample :
where is the mean intra-cluster distance and is the mean nearest-cluster distance. The overall score is the mean over all samples, ranging from -1 (poor clustering) to 1 (dense, well-separated clusters).
Calinski-Harabasz Index (Variance Ratio Criterion):
where is the between-cluster dispersion matrix, is the within-cluster dispersion matrix, is the number of samples, and is the number of clusters. Higher values indicate better-defined clusters.
Davies-Bouldin Index:
where is the average distance within cluster and is the distance between cluster centroids. Lower values indicate better clustering.