Implementation:Rapidsai Cuml Cluster Evaluation Metrics
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering, Evaluation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for evaluating clustering quality using GPU-accelerated metrics: KMeans score (negative inertia), adjusted Rand index, and silhouette score.
Description
Three primary evaluation methods for clustering results:
- KMeans.score returns the negative sum of squared distances from each sample to its assigned cluster center (negative inertia). More negative values indicate worse clustering.
- adjusted_rand_score computes the Adjusted Rand Index between two label assignments, measuring clustering similarity corrected for chance. Range [-1.0, 1.0] with 1.0 being perfect agreement.
- silhouette_score measures how similar each sample is to its own cluster compared to other clusters. Range [-1.0, 1.0] with higher values indicating better-defined clusters.
Usage
Use `KMeans.score(X)` for within-cluster compactness, `adjusted_rand_score(labels_true, labels_pred)` when ground truth is available, and `silhouette_score(X, labels)` for unsupervised evaluation.
Code Reference
KMeans.score
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/kmeans.pyx - Lines: 778-787
Signature
def score(self, X, y=None, sample_weight=None, *, convert_dtype=True):
adjusted_rand_score
Source Location
- Repository: cuML
- File:
python/cuml/cuml/metrics/cluster/adjusted_rand_index.pyx - Lines: 22-56
Signature
def adjusted_rand_score(labels_true, labels_pred, convert_dtype=True):
silhouette_score
Source Location
- Repository: cuML
- File:
python/cuml/cuml/metrics/cluster/silhouette_score.pyx - Lines: 42-100
Signature
def silhouette_score(X, labels, metric='euclidean', sil_scores=None, chunksize=None, convert_dtype=True):
Import
from cuml.metrics import adjusted_rand_score
from cuml.metrics.cluster import silhouette_score
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like | Yes (score, silhouette) | Feature matrix (n_samples, n_features). |
| labels_true | array-like | Yes (ARI) | Ground truth cluster labels. |
| labels_pred | array-like | Yes (ARI) | Predicted cluster labels. |
| labels | array-like | Yes (silhouette) | Cluster assignments for each sample. |
| metric | str | No (default 'euclidean') | Distance metric for silhouette score. |
| chunksize | int or None | No | Number of samples per batch for silhouette computation. |
Outputs
| Name | Type | Description |
|---|---|---|
| KMeans.score | float | Negative inertia (sum of squared distances to centers). |
| adjusted_rand_score | float | ARI value in [-1.0, 1.0]. 1.0 = perfect agreement. |
| silhouette_score | float | Mean silhouette coefficient in [-1.0, 1.0]. Higher = better separation. |
Usage Examples
import cupy as cp
from cuml.cluster import KMeans
from cuml.metrics import adjusted_rand_score
from cuml.metrics.cluster import silhouette_score
X = cp.random.rand(5000, 20, dtype=cp.float32)
true_labels = cp.random.randint(0, 5, 5000)
# Fit and evaluate
kmeans = KMeans(n_clusters=5).fit(X)
# Negative inertia
score = kmeans.score(X)
# ARI against ground truth
ari = adjusted_rand_score(true_labels, kmeans.labels_)
# Silhouette score (unsupervised)
sil = silhouette_score(X, kmeans.labels_)
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment