Implementation:Online ml River Metrics VBeta

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Evaluation_Metrics, Clustering
Last Updated	2026-02-08 16:00 GMT

Overview

V-Measure family of entropy-based cluster evaluation metrics including Homogeneity, Completeness, and VBeta.

Description

This module provides three related metrics. Homogeneity measures if each cluster contains only members of a single class (cluster purity). Completeness measures if all members of a class are assigned to the same cluster (class coverage). VBeta (V-Measure) combines both as a weighted harmonic mean: V_β = ((1+β)×h×c)/(β×h+c), where h is homogeneity, c is completeness, and beta controls their relative importance. All metrics are entropy-based, symmetric, and permutation-invariant.

Usage

Use Homogeneity when cluster purity is most important (each cluster should be homogeneous). Use Completeness when class coverage matters most (all members of a class should cluster together). Use VBeta to balance both aspects, with beta=1 giving equal weight. These metrics are valuable for evaluating clustering when you have ground truth labels but don't know the cluster-to-class mapping, as they're insensitive to label permutations.

Code Reference

Source Location

Repository: Online_ml_River
File: river/metrics/vbeta.py

Signature

class Homogeneity(metrics.base.MultiClassMetric):
    def __init__(self, cm=None):
        pass

class Completeness(metrics.base.MultiClassMetric):
    def __init__(self, cm=None):
        pass

class VBeta(metrics.base.MultiClassMetric):
    def __init__(self, beta: float = 1.0, cm=None):
        pass

Import

from river import metrics

I/O Contract

Method	Parameters	Returns	Description
update	y_true, y_pred	None	Updates metric with true and predicted cluster labels
get	-	float	Returns metric score (0.0 to 1.0)

Note: These metrics do not support sample weights (works_with_weights returns False).

Usage Examples

from river import metrics

y_true = [1, 1, 2, 2, 3, 3]
y_pred = [1, 1, 1, 2, 2, 2]

# Homogeneity (cluster purity)
metric_h = metrics.Homogeneity()
for yt, yp in zip(y_true, y_pred):
    metric_h.update(yt, yp)
    print(metric_h.get())
# 1.0
# 1.0
# 0.0
# 0.311278
# 0.37515
# 0.42062

print(metric_h)
# Homogeneity: 42.06%
# Moderate homogeneity: clusters have mixed classes

# Completeness (class coverage)
metric_c = metrics.Completeness()
for yt, yp in zip(y_true, y_pred):
    metric_c.update(yt, yp)
    print(metric_c.get())
# 1.0
# 1.0
# 1.0
# 0.3836885465963443
# 0.5880325916843805
# 0.6666666666666667

print(metric_c)
# Completeness: 66.67%
# Better completeness: classes are more consolidated

# V-Measure (balanced combination)
metric_v = metrics.VBeta(beta=1.0)
for yt, yp in zip(y_true, y_pred):
    metric_v.update(yt, yp)
    print(metric_v.get())
# 1.0
# 1.0
# 0.0
# 0.3437110184854507
# 0.4580652856440158
# 0.5158037429793888

print(metric_v)
# VBeta: 51.58%
# V-Measure balances homogeneity and completeness

# Adjust beta to weight homogeneity or completeness
metric_v2 = metrics.VBeta(beta=2.0)  # Weight completeness more
for yt, yp in zip(y_true, y_pred):
    metric_v2.update(yt, yp)

print(f"VBeta (beta=2): {metric_v2.get():.2%}")
# Higher beta emphasizes completeness over homogeneity

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment