Implementation:Scikit learn Scikit learn SupervisedClusterMetrics
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Clustering |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for evaluating clustering performance using ground truth labels provided by scikit-learn.
Description
The supervised clustering metrics module provides functions to evaluate clustering quality when ground truth cluster assignments are available. It includes information-theoretic measures (mutual information, normalized mutual information, adjusted mutual information), pair-counting measures (Rand index, adjusted Rand index, Fowlkes-Mallows), and homogeneity-completeness-V-measure metrics. These metrics allow comparison between predicted cluster labels and known true labels.
Usage
Use these metrics when you have ground truth cluster labels and need to evaluate how well a clustering algorithm has recovered the true grouping structure. These are commonly used for benchmarking clustering algorithms on labeled datasets.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/metrics/cluster/_supervised.py
Signature
def check_clusterings(labels_true, labels_pred)
def contingency_matrix(labels_true, labels_pred, *, eps=None, sparse=False, dtype=np.int64)
def pair_confusion_matrix(labels_true, labels_pred)
def rand_score(labels_true, labels_pred)
def adjusted_rand_score(labels_true, labels_pred)
def homogeneity_completeness_v_measure(labels_true, labels_pred, *, beta=1.0)
def homogeneity_score(labels_true, labels_pred)
def completeness_score(labels_true, labels_pred)
def v_measure_score(labels_true, labels_pred, *, beta=1.0)
def mutual_info_score(labels_true, labels_pred, *, contingency=None)
def adjusted_mutual_info_score(labels_true, labels_pred, *, average_method="arithmetic")
def normalized_mutual_info_score(labels_true, labels_pred, *, average_method="arithmetic")
def fowlkes_mallows_score(labels_true, labels_pred, *, sparse="deprecated")
def entropy(labels)
Import
from sklearn.metrics.cluster import adjusted_rand_score, normalized_mutual_info_score
from sklearn.metrics.cluster import homogeneity_score, completeness_score, v_measure_score
from sklearn.metrics.cluster import fowlkes_mallows_score, mutual_info_score
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| labels_true | array-like of shape (n_samples,) | Yes | Ground truth class labels as integers or strings |
| labels_pred | array-like of shape (n_samples,) | Yes | Cluster labels to evaluate |
| beta | float | No | Weight of homogeneity vs completeness in V-measure (default 1.0) |
| average_method | str | No | Averaging method for NMI/AMI: arithmetic, geometric, min, max |
| eps | float | No | Value to replace zeros in contingency matrix (for log computation) |
| sparse | bool | No | Whether to return a sparse contingency matrix |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | Scalar metric value, typically in range [0, 1] or [-1, 1] for adjusted metrics |
| contingency | ndarray or sparse matrix | Contingency matrix (for contingency_matrix function) |
Usage Examples
Basic Usage
from sklearn.metrics.cluster import adjusted_rand_score, normalized_mutual_info_score
from sklearn.metrics.cluster import homogeneity_score, completeness_score, v_measure_score
labels_true = [0, 0, 0, 1, 1, 1]
labels_pred = [0, 0, 1, 1, 2, 2]
ari = adjusted_rand_score(labels_true, labels_pred)
print(f"Adjusted Rand Index: {ari:.3f}")
nmi = normalized_mutual_info_score(labels_true, labels_pred)
print(f"Normalized Mutual Information: {nmi:.3f}")
h = homogeneity_score(labels_true, labels_pred)
c = completeness_score(labels_true, labels_pred)
v = v_measure_score(labels_true, labels_pred)
print(f"Homogeneity: {h:.3f}, Completeness: {c:.3f}, V-measure: {v:.3f}")