Implementation:Online ml River Metrics SampleAverage
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Evaluation_Metrics, Multi_Output_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Sample-average wrapper evaluating the metric on each sample independently and averaging results.
Description
SampleAverage wraps any single-output metric for multi-output problems by computing the metric separately for each sample's outputs, then averaging across all samples. For each sample, it creates a fresh metric instance, updates it with all output values from that sample, and stores the resulting metric value. The final score is the arithmetic mean across all per-sample scores. This is equivalent to scikit-learn's average='samples' parameter.
Usage
Use SampleAverage when you want to evaluate performance sample-by-sample rather than output-by-output or globally. This approach gives equal weight to each sample regardless of how many outputs it has, making it suitable when sample-level performance is more important than output-level or global performance. It's particularly useful in multi-label classification when you want to know how well the model performs on average across individual instances.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/metrics/multioutput/sample_average.py
Signature
class SampleAverage(MultiOutputMetric, metrics.base.WrapperMetric):
def __init__(self, metric):
# metric: Any classification or regression metric
pass
Import
from river import metrics
I/O Contract
| Method | Parameters | Returns | Description |
|---|---|---|---|
| update | y_true (dict), y_pred (dict), [w] | None | Evaluates metric on sample, adds to running average |
| get | - | float | Returns average metric value across all samples |
Usage Examples
from river import metrics
# Sample-average Jaccard score
sample_jaccard = metrics.multioutput.SampleAverage(metrics.Jaccard())
y_true = [
{0: False, 1: True, 2: True},
{0: True, 1: True, 2: False}
]
y_pred = [
{0: True, 1: True, 2: True},
{0: True, 1: False, 2: False}
]
for yt, yp in zip(y_true, y_pred):
sample_jaccard.update(yt, yp)
print(sample_jaccard)
# SampleAverage(Jaccard): 58.33%
# How it's computed:
# Sample 1: Jaccard across outputs {0,1,2} = 66.67%
# Sample 2: Jaccard across outputs {0,1,2} = 50.00%
# Average: (66.67 + 50.00) / 2 = 58.33%
# Compare with other averaging methods
macro_jaccard = metrics.multioutput.MacroAverage(metrics.Jaccard())
micro_jaccard = metrics.multioutput.MicroAverage(metrics.Jaccard())
for yt, yp in zip(y_true, y_pred):
macro_jaccard.update(yt, yp)
micro_jaccard.update(yt, yp)
print(f"Sample: {sample_jaccard.get():.2%}")
print(f"Macro: {macro_jaccard.get():.2%}")
print(f"Micro: {micro_jaccard.get():.2%}")
# Sample: Averages per-sample performance
# Macro: Averages per-output performance
# Micro: Global performance across all predictions
# Example showing sample weighting
sample_f1_weighted = metrics.multioutput.SampleAverage(metrics.F1())
y_true_w = [
{0: True, 1: True},
{0: False, 1: False},
]
y_pred_w = [
{0: True, 1: True}, # Perfect
{0: False, 1: False}, # Perfect
]
# Weight samples differently
sample_f1_weighted.update(y_true_w[0], y_pred_w[0], w=2.0)
sample_f1_weighted.update(y_true_w[1], y_pred_w[1], w=1.0)
print(f"Weighted sample F1: {sample_f1_weighted.get():.2%}")
# First sample has twice the influence on final score