Principle:Online ml River Streaming Adjusted Rand
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Objective criteria for the evaluation of clustering methods (Rand, 1971) | Cluster Evaluation, Streaming Metrics, Supervised Evaluation | 2026-02-08 16:00 GMT |
Overview
Streaming Adjusted Rand Index is an incremental computation of the Adjusted Rand Index for comparing predicted cluster assignments against ground truth labels in a streaming fashion.
Description
The Rand Index (RI) measures the agreement between two partitions (clusterings) of a set of elements by counting the fraction of pairs of elements that are either in the same subset in both partitions or in different subsets in both partitions. However, the raw Rand Index does not account for chance agreement -- even two random partitions can have a non-trivial RI.
The Adjusted Rand Index (ARI) corrects for chance by establishing a baseline using the expected similarity under a random model. The adjustment ensures that:
- ARI = 1.0 indicates perfect agreement between the two clusterings.
- ARI = 0.0 indicates agreement no better than random chance.
- ARI < 0.0 indicates agreement worse than random chance.
In River's streaming implementation, the ARI is computed incrementally using a contingency table (confusion matrix) that is updated with each new observation. The contingency table tracks co-occurrence counts between true labels and predicted labels. From this table, a pair confusion matrix is derived, containing four counts: true positives, true negatives, false positives, and false negatives at the pair level.
This approach is efficient because updating the contingency table with one new observation is O(1) in the number of observations, while recomputing the pair confusion from the contingency table is O(k^2) where k is the number of distinct labels.
Usage
Use Streaming Adjusted Rand when:
- You have ground truth labels available and want to evaluate how well your clustering matches them.
- You need an evaluation metric that corrects for chance agreement (unlike the raw Rand Index).
- You want to evaluate in streaming fashion without storing all past predictions.
- You want a metric compatible with River's multi-class evaluation framework.
The ARI is a supervised clustering metric -- it requires both true labels and predicted labels, unlike the Silhouette coefficient which is unsupervised.
Theoretical Basis
Pair Confusion Matrix:
Given a contingency table C where C[i][j] is the number of observations with true label i and predicted label j:
sum_squares = SUM_ij C[i][j]^2
false_positives = SUM_ij C[i][j] * col_sum[j] - sum_squares
false_negatives = SUM_ij C[j][i] * row_sum[j] - sum_squares
true_positives = sum_squares - n
true_negatives = n^2 - false_positives - false_negatives - sum_squares
Where n is the total number of observations, row_sum and col_sum are the marginal sums.
Adjusted Rand Index:
ARI = 2 * (TP * TN - FN * FP) / ((TP + FN)(FN + TN) + (TP + FP)(FP + TN))
This is equivalent to the standard formulation:
ARI = (RI - Expected_RI) / (max_RI - Expected_RI)
Where the expected RI is computed under a hypergeometric (permutation) model.
Streaming computation:
INITIALIZE: contingency_table = empty confusion matrix
FOR each new observation (y_true, y_pred):
contingency_table.update(y_true, y_pred)
ON get():
pair_confusion = compute_pair_confusion(contingency_table)
RETURN adjusted_rand_from_pair_confusion(pair_confusion)
The contingency table update is O(1) per observation. The ARI computation from the pair confusion is O(k^2) where k is the number of unique labels but is only computed when get() is called.