Implementation:Online ml River Metrics VBeta
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Evaluation_Metrics, Clustering |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
V-Measure family of entropy-based cluster evaluation metrics including Homogeneity, Completeness, and VBeta.
Description
This module provides three related metrics. Homogeneity measures if each cluster contains only members of a single class (cluster purity). Completeness measures if all members of a class are assigned to the same cluster (class coverage). VBeta (V-Measure) combines both as a weighted harmonic mean: V_β = ((1+β)×h×c)/(β×h+c), where h is homogeneity, c is completeness, and beta controls their relative importance. All metrics are entropy-based, symmetric, and permutation-invariant.
Usage
Use Homogeneity when cluster purity is most important (each cluster should be homogeneous). Use Completeness when class coverage matters most (all members of a class should cluster together). Use VBeta to balance both aspects, with beta=1 giving equal weight. These metrics are valuable for evaluating clustering when you have ground truth labels but don't know the cluster-to-class mapping, as they're insensitive to label permutations.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/metrics/vbeta.py
Signature
class Homogeneity(metrics.base.MultiClassMetric):
def __init__(self, cm=None):
pass
class Completeness(metrics.base.MultiClassMetric):
def __init__(self, cm=None):
pass
class VBeta(metrics.base.MultiClassMetric):
def __init__(self, beta: float = 1.0, cm=None):
pass
Import
from river import metrics
I/O Contract
| Method | Parameters | Returns | Description |
|---|---|---|---|
| update | y_true, y_pred | None | Updates metric with true and predicted cluster labels |
| get | - | float | Returns metric score (0.0 to 1.0) |
Note: These metrics do not support sample weights (works_with_weights returns False).
Usage Examples
from river import metrics
y_true = [1, 1, 2, 2, 3, 3]
y_pred = [1, 1, 1, 2, 2, 2]
# Homogeneity (cluster purity)
metric_h = metrics.Homogeneity()
for yt, yp in zip(y_true, y_pred):
metric_h.update(yt, yp)
print(metric_h.get())
# 1.0
# 1.0
# 0.0
# 0.311278
# 0.37515
# 0.42062
print(metric_h)
# Homogeneity: 42.06%
# Moderate homogeneity: clusters have mixed classes
# Completeness (class coverage)
metric_c = metrics.Completeness()
for yt, yp in zip(y_true, y_pred):
metric_c.update(yt, yp)
print(metric_c.get())
# 1.0
# 1.0
# 1.0
# 0.3836885465963443
# 0.5880325916843805
# 0.6666666666666667
print(metric_c)
# Completeness: 66.67%
# Better completeness: classes are more consolidated
# V-Measure (balanced combination)
metric_v = metrics.VBeta(beta=1.0)
for yt, yp in zip(y_true, y_pred):
metric_v.update(yt, yp)
print(metric_v.get())
# 1.0
# 1.0
# 0.0
# 0.3437110184854507
# 0.4580652856440158
# 0.5158037429793888
print(metric_v)
# VBeta: 51.58%
# V-Measure balances homogeneity and completeness
# Adjust beta to weight homogeneity or completeness
metric_v2 = metrics.VBeta(beta=2.0) # Weight completeness more
for yt, yp in zip(y_true, y_pred):
metric_v2.update(yt, yp)
print(f"VBeta (beta=2): {metric_v2.get():.2%}")
# Higher beta emphasizes completeness over homogeneity