Principle:Online ml River Horizon Metric
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Time Series Forecasting, Model Evaluation | 2026-02-08 16:00 GMT |
Overview
Metric wrapper that maintains separate evaluation metric instances for each step in a multi-step forecast horizon.
Description
When evaluating multi-step forecasting models, a single aggregate metric value obscures important information about how accuracy varies with forecast distance. The Horizon Metric principle addresses this by maintaining separate, independent copies of a base regression metric for each step of the forecast horizon.
Given a base metric (e.g., MAE) and a forecast horizon h, the HorizonMetric creates h independent metric instances. When update(y_true, y_pred) is called with lists of true and predicted values, the i-th metric is updated with (y_true[i], y_pred[i]). This produces h independent performance measurements, one for each forecast distance.
For cases where a single summary value is needed (e.g., model selection or hyperparameter tuning), HorizonAggMetric extends HorizonMetric by applying an aggregation function (such as mean, max, or median) to the per-step metric values.
Usage
Use HorizonMetric when:
- You need to understand how forecaster accuracy varies across different prediction horizons
- You want detailed per-step performance breakdown for diagnostic purposes
- You are comparing multiple models and need to identify which performs better at specific horizons
Use HorizonAggMetric when:
- You need a single scalar metric for model selection or hyperparameter optimization
- You want to summarize overall multi-step performance with a simple aggregate
Theoretical Basis
Per-Step Metric Decomposition
Given a base regression metric M (e.g., MAE, MSE, RMSE) and horizon h, the HorizonMetric maintains:
M_1, M_2, ..., M_h (h independent copies of M)
At each evaluation step t, the update processes parallel lists:
y_true = [y_{t+1}, y_{t+2}, ..., y_{t+h}]
y_pred = [hat{y}_{t+1}, hat{y}_{t+2}, ..., hat{y}_{t+h}]
For i = 1, 2, ..., h:
M_i.update(y_true[i], y_pred[i])
The get() method returns:
[M_1.get(), M_2.get(), ..., M_h.get()]
Each M_i accumulates results across all evaluation steps, providing the average (or accumulated) metric value specifically for the i-th step ahead.
Aggregation
HorizonAggMetric applies a function f to the per-step values:
result = f([M_1.get(), M_2.get(), ..., M_h.get()])
Common choices for f:
statistics.mean: Average performance across all horizon stepsmax: Worst-case performance across all horizon stepsstatistics.median: Median performance, robust to outlier steps
Lazy Initialization
Metric copies are created lazily via metric.clone() on first use. This means the number of internal metrics grows up to h as updates are received, rather than being pre-allocated. This design accommodates variable-length forecast horizons.