Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Metrics SampleAverage

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Evaluation_Metrics, Multi_Output_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Sample-average wrapper evaluating the metric on each sample independently and averaging results.

Description

SampleAverage wraps any single-output metric for multi-output problems by computing the metric separately for each sample's outputs, then averaging across all samples. For each sample, it creates a fresh metric instance, updates it with all output values from that sample, and stores the resulting metric value. The final score is the arithmetic mean across all per-sample scores. This is equivalent to scikit-learn's average='samples' parameter.

Usage

Use SampleAverage when you want to evaluate performance sample-by-sample rather than output-by-output or globally. This approach gives equal weight to each sample regardless of how many outputs it has, making it suitable when sample-level performance is more important than output-level or global performance. It's particularly useful in multi-label classification when you want to know how well the model performs on average across individual instances.

Code Reference

Source Location

  • Repository: Online_ml_River
  • File: river/metrics/multioutput/sample_average.py

Signature

class SampleAverage(MultiOutputMetric, metrics.base.WrapperMetric):
    def __init__(self, metric):
        # metric: Any classification or regression metric
        pass

Import

from river import metrics

I/O Contract

Method Parameters Returns Description
update y_true (dict), y_pred (dict), [w] None Evaluates metric on sample, adds to running average
get - float Returns average metric value across all samples

Usage Examples

from river import metrics

# Sample-average Jaccard score
sample_jaccard = metrics.multioutput.SampleAverage(metrics.Jaccard())

y_true = [
    {0: False, 1: True, 2: True},
    {0: True, 1: True, 2: False}
]
y_pred = [
    {0: True, 1: True, 2: True},
    {0: True, 1: False, 2: False}
]

for yt, yp in zip(y_true, y_pred):
    sample_jaccard.update(yt, yp)

print(sample_jaccard)
# SampleAverage(Jaccard): 58.33%

# How it's computed:
# Sample 1: Jaccard across outputs {0,1,2} = 66.67%
# Sample 2: Jaccard across outputs {0,1,2} = 50.00%
# Average: (66.67 + 50.00) / 2 = 58.33%

# Compare with other averaging methods
macro_jaccard = metrics.multioutput.MacroAverage(metrics.Jaccard())
micro_jaccard = metrics.multioutput.MicroAverage(metrics.Jaccard())

for yt, yp in zip(y_true, y_pred):
    macro_jaccard.update(yt, yp)
    micro_jaccard.update(yt, yp)

print(f"Sample: {sample_jaccard.get():.2%}")
print(f"Macro:  {macro_jaccard.get():.2%}")
print(f"Micro:  {micro_jaccard.get():.2%}")
# Sample: Averages per-sample performance
# Macro:  Averages per-output performance
# Micro:  Global performance across all predictions

# Example showing sample weighting
sample_f1_weighted = metrics.multioutput.SampleAverage(metrics.F1())

y_true_w = [
    {0: True, 1: True},
    {0: False, 1: False},
]
y_pred_w = [
    {0: True, 1: True},   # Perfect
    {0: False, 1: False}, # Perfect
]

# Weight samples differently
sample_f1_weighted.update(y_true_w[0], y_pred_w[0], w=2.0)
sample_f1_weighted.update(y_true_w[1], y_pred_w[1], w=1.0)

print(f"Weighted sample F1: {sample_f1_weighted.get():.2%}")
# First sample has twice the influence on final score

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment