Implementation:Online ml River Metrics PerOutput
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Evaluation_Metrics, Multi_Output_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Per-output wrapper maintaining separate metric instances for each output without aggregation.
Description
PerOutput wraps any single-output metric for multi-output problems by maintaining an independent copy of the metric for each output. Unlike MacroAverage (which returns the mean) or MicroAverage (which aggregates all outputs), PerOutput returns a dictionary mapping output IDs to their individual metric instances. This allows detailed inspection of per-output performance without any aggregation or averaging.
Usage
Use PerOutput when you need to track and inspect individual performance for each output separately without aggregation. This is valuable for debugging, identifying problematic outputs, comparing performance across different outputs, or when you need access to full metric objects (not just their values) for each output. The get() method returns a dictionary of metrics rather than a single value.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/metrics/multioutput/per_output.py
Signature
class PerOutput(MultiOutputMetric, metrics.base.WrapperMetric):
def __init__(self, metric):
# metric: Any classification or regression metric
pass
Import
from river import metrics
I/O Contract
| Method | Parameters | Returns | Description |
|---|---|---|---|
| update | y_true (dict), y_pred (dict), [w] | None | Updates all per-output metric copies |
| get | - | dict | Returns dictionary mapping output IDs to metric instances |
Usage Examples
from river import metrics
# Track F1 score separately for each output
per_output_f1 = metrics.multioutput.PerOutput(metrics.F1())
y_true = [
{0: False, 1: True, 2: True},
{0: True, 1: True, 2: False},
{0: True, 1: False, 2: True},
]
y_pred = [
{0: False, 1: True, 2: True}, # All correct
{0: True, 1: False, 2: False}, # Label 1 wrong
{0: False, 1: False, 2: True}, # Label 0 wrong
]
for yt, yp in zip(y_true, y_pred):
per_output_f1.update(yt, yp)
# Display all outputs
print(per_output_f1)
# 0 - F1: 66.67%
# 1 - F1: 50.00%
# 2 - F1: 100.00%
# Access individual metrics
metrics_dict = per_output_f1.get()
for output_id, metric in metrics_dict.items():
print(f"Output {output_id}: {metric.get():.2%}")
# Get specific output's metric
output_0_f1 = per_output_f1.metrics[0]
print(f"Output 0 F1: {output_0_f1}")
# Identify best and worst performing outputs
best_output = max(metrics_dict.items(), key=lambda x: x[1].get())
worst_output = min(metrics_dict.items(), key=lambda x: x[1].get())
print(f"Best: Output {best_output[0]} with {best_output[1].get():.2%}")
print(f"Worst: Output {worst_output[0]} with {worst_output[1].get():.2%}")
# Regression example with MAE
per_output_mae = metrics.multioutput.PerOutput(metrics.MAE())
y_true_reg = [
{0: 1.0, 1: 2.0, 2: 3.0},
{0: 2.0, 1: 3.0, 2: 4.0},
]
y_pred_reg = [
{0: 1.1, 1: 2.5, 2: 2.9}, # Output 1 has larger error
{0: 2.05, 1: 3.3, 2: 4.1},
]
for yt, yp in zip(y_true_reg, y_pred_reg):
per_output_mae.update(yt, yp)
print("\nPer-output MAE:")
print(per_output_mae)
# Shows individual MAE for each output
# Useful for identifying which outputs are harder to predict