Implementation:Online ml River Metrics AdjustedRand
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Objective criteria for the evaluation of clustering methods (Rand, 1971) | Cluster Evaluation, Streaming Metrics | 2026-02-08 16:00 GMT |
Overview
Concrete tool for incrementally computing the Adjusted Rand Index to compare predicted cluster assignments against ground truth labels in a streaming fashion using an updatable contingency table.
Description
The metrics.AdjustedRand class computes the Adjusted Rand Index (ARI) incrementally. It inherits from metrics.base.MultiClassMetric and uses River's confusion matrix infrastructure to maintain a contingency table that is updated with each (y_true, y_pred) pair. When get() is called, the pair confusion matrix is derived from the contingency table, and the ARI is computed from the true positives, true negatives, false positives, and false negatives at the pair level.
The metric returns 1.0 for perfect agreement, 0.0 for chance-level agreement, and negative values for agreement worse than chance. It returns 1.0 when there are no pairs to compare (zero denominator).
Usage
Import metrics.AdjustedRand when you have ground truth labels available and want to evaluate online clustering quality with a chance-corrected metric. Call update(y_true, y_pred) after each prediction.
Code Reference
Source Location
river/metrics/rand.py:L117-L195
Signature
class AdjustedRand(metrics.base.MultiClassMetric):
def __init__(self, cm=None)
Import
from river import metrics
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| cm | None | Optional shared confusion matrix. If provided, allows sharing the same confusion matrix between multiple metrics to reduce storage and computation time. |
Methods
| Method | Signature | Description |
|---|---|---|
| update | update(y_true, y_pred, w=1.0) -> None |
Updates the internal contingency table with one new observation's true and predicted labels. |
| get | get() -> float |
Computes and returns the current Adjusted Rand Index from the contingency table. Returns 1.0 on zero-division. |
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| y_true | any hashable | The ground truth cluster label for the observation. |
| y_pred | any hashable | The predicted cluster label for the observation. |
| w | float |
Optional sample weight (default 1.0). Note: works_with_weights = False for this metric.
|
Outputs
| Output | Type | Description |
|---|---|---|
| get() return | float |
The Adjusted Rand Index. 1.0 = perfect agreement; 0.0 = chance agreement; negative = worse than chance. |
Usage Examples
from river import metrics
y_true = [0, 0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 1, 2, 2]
metric = metrics.AdjustedRand()
for yt, yp in zip(y_true, y_pred):
metric.update(yt, yp)
print(metric.get())
# 1.0
# 1.0
# 0.0
# 0.0
# 0.09090909090909091
# 0.24242424242424243
print(metric)
# AdjustedRand: 0.242424
Using with an online clustering model:
from river import cluster, stream, metrics
model = cluster.KMeans(n_clusters=2, halflife=0.5, seed=42)
metric = metrics.AdjustedRand()
# Labeled data for evaluation
data = [
({'x': 1, 'y': 2}, 0),
({'x': 1.5, 'y': 1.8}, 0),
({'x': 5, 'y': 8}, 1),
({'x': 8, 'y': 8}, 1),
]
for x, y_true in data:
model.learn_one(x)
y_pred = model.predict_one(x)
metric.update(y_true, y_pred)
print(f'ARI: {metric.get():.4f}')