Implementation:Online ml River Metrics Silhouette
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Silhouettes: a graphical aid to the interpretation and validation of cluster analysis (Rousseeuw, 1987) Machine Learning for Data Streams (Bifet et al., 2018) | Cluster Evaluation, Streaming Metrics | 2026-02-08 16:00 GMT |
Overview
Concrete tool for incrementally computing the Silhouette coefficient to evaluate online clustering quality using distances to cluster centroids rather than pairwise point distances.
Description
The metrics.Silhouette class provides an incremental Silhouette coefficient for evaluating clustering results in a streaming context. It maintains two running sums: the cumulative distance from each point to its assigned cluster center and the cumulative distance from each point to its second-closest cluster center. The get() method returns the ratio of these two sums.
Unlike the classical batch Silhouette, this implementation uses centroid-based distances and has a different interpretation: lower values indicate better clustering (bigger_is_better = False). A value close to 0 means excellent cohesion relative to separation, while values approaching 1 or higher indicate poor clustering.
The metric requires the caller to pass the current cluster centers on each update, making it suitable for use with algorithms that expose a centers attribute (such as cluster.KMeans).
Usage
Import metrics.Silhouette when you need an unsupervised streaming evaluation metric for clustering. Call update after each learn/predict cycle with the current cluster centers.
Code Reference
Source Location
river/metrics/silhouette.py:L8-L93
Signature
class Silhouette(metrics.base.ClusteringMetric):
def __init__(self)
Import
from river import metrics
Methods
| Method | Signature | Description |
|---|---|---|
| update | update(x: dict, y_pred: int, centers: dict, w=1.0) -> None |
Updates the running sums with the distances from x to its assigned center and to the second-closest center. |
| revert | revert(x: dict, y_pred: int, centers: dict, w=1.0) -> None |
Reverts a previous update by subtracting the corresponding distances. |
| get | get() -> float |
Returns the current Silhouette coefficient (ratio of closest to second-closest center distances). Returns math.inf on zero-division.
|
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| x | dict |
Feature dictionary for the current observation. |
| y_pred | int |
The predicted cluster index for x. |
| centers | dict |
A dictionary mapping cluster IDs to centroid positions (e.g., model.centers).
|
| w | float |
Optional sample weight (default 1.0). |
Outputs
| Output | Type | Description |
|---|---|---|
| get() return | float |
The streaming Silhouette coefficient. Lower values indicate better clustering (bigger_is_better = False).
|
Usage Examples
from river import cluster
from river import stream
from river import metrics
X = [
[1, 2],
[1, 4],
[1, 0],
[4, 2],
[4, 4],
[4, 0],
[-2, 2],
[-2, 4],
[-2, 0]
]
k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
metric = metrics.Silhouette()
for x, _ in stream.iter_array(X):
k_means.learn_one(x)
y_pred = k_means.predict_one(x)
metric.update(x, y_pred, k_means.centers)
print(metric)
# Silhouette: 0.32145