Implementation:Online ml River Evaluate Progressive Val Score
| Knowledge Sources | River River Docs Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation |
|---|---|
| Domains | Online_Learning Evaluation Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete tool for evaluating online learning models using the progressive validation (test-then-train) protocol, returning a single metric result after processing the entire dataset.
Description
The evaluate.progressive_val_score function is the canonical way to evaluate an online learning model in River. It implements the predict-then-learn protocol: for each observation in the dataset, the model first makes a prediction, the metric is updated with the prediction and the true target, and then the model is trained on the observation. This ensures that the model is always evaluated on unseen data.
Under the hood, the function is implemented on top of evaluate.iter_progressive_val_score. It consumes the entire generator, optionally printing intermediate results at intervals specified by print_every, and returns the final metric object.
Key capabilities:
- Delayed feedback: The
momentanddelayparameters simulate real-world scenarios where labels arrive after predictions. When specified,stream.simulate_qais used to reorder observations into a question-answer sequence. - Progress printing: Setting
print_every=Nprints the metric state every N observations. Theshow_timeandshow_memoryflags add elapsed time and memory usage to the output. - Active learning support: When the model is an active learner, the function tracks how many labels were actually used for training.
- Automatic prediction method selection: The function automatically chooses between
predict_one,predict_proba_one, andscore_onedepending on the model type and metric requirements.
Usage
Import this function when you need to:
- Evaluate an online model on a streaming dataset and obtain a single metric result.
- Get the standard, canonical evaluation for an online learning experiment.
- Monitor progress during evaluation with periodic printing.
- Simulate delayed feedback scenarios.
Code Reference
Source Location
| File | Lines |
|---|---|
river/evaluate/progressive_validation.py |
L231-L409 |
Signature
def progressive_val_score(
dataset: base.typing.Dataset,
model,
metric: metrics.base.Metric,
moment: str | typing.Callable | None = None,
delay: str | int | dt.timedelta | typing.Callable | None = None,
print_every=0,
show_time=False,
show_memory=False,
**print_kwargs,
) -> metrics.base.Metric
Import
from river import evaluate
result = evaluate.progressive_val_score(dataset, model, metric)
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
dataset |
base.typing.Dataset |
(required) | Iterable stream of (x, y) tuples or (x, y, kwargs) tuples.
|
model |
Estimator | (required) | The model to evaluate. Must support learn_one and at least one prediction method.
|
metric |
metrics.base.Metric |
(required) | The metric used to evaluate predictions. Updated in-place. |
moment |
Callable | None | None |
Attribute or function for measuring time (for delayed feedback). If None, observations are processed in order.
|
delay |
int | timedelta | Callable | None | None |
Amount to wait before revealing labels. If None, no delay (standard progressive validation).
|
print_every |
int |
0 |
Print metric state every N observations. 0 disables printing.
|
show_time |
bool |
False |
Whether to display elapsed time in progress output. |
show_memory |
bool |
False |
Whether to display model memory usage in progress output. |
**print_kwargs |
Additional keyword arguments passed to Python's print function (e.g., file=f for file output).
|
Outputs
| Output | Type | Description |
|---|---|---|
| Return value | metrics.base.Metric |
The metric object, updated with all observations from the dataset. Call metric.get() for the numeric value or str(metric) for formatted output.
|
Usage Examples
Basic progressive validation:
from river import datasets, evaluate, linear_model, metrics, preprocessing
dataset = datasets.Phishing()
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.Accuracy()
result = evaluate.progressive_val_score(dataset, model, metric)
print(result)
# Accuracy: 88.96%
With progress printing:
from river import datasets, evaluate, linear_model, metrics, preprocessing
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
evaluate.progressive_val_score(
model=model,
dataset=datasets.Phishing(),
metric=metrics.ROCAUC(),
print_every=200,
)
# [200] ROCAUC: 90.20%
# [400] ROCAUC: 92.25%
# [600] ROCAUC: 93.23%
# [800] ROCAUC: 94.05%
# [1,000] ROCAUC: 94.79%
# [1,200] ROCAUC: 95.07%
# [1,250] ROCAUC: 95.07%
# ROCAUC: 95.07%
Equivalent manual loop:
from river import datasets, linear_model, metrics, preprocessing
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.ROCAUC()
for x, y in datasets.Phishing():
y_pred = model.predict_proba_one(x)
metric.update(y, y_pred)
model.learn_one(x, y)
print(metric)
# ROCAUC: 95.07%
Logging progress to a file:
from river import datasets, evaluate, linear_model, metrics, preprocessing
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
with open('progress.log', 'w') as f:
evaluate.progressive_val_score(
model=model,
dataset=datasets.Phishing(),
metric=metrics.ROCAUC(),
print_every=200,
file=f,
)