Implementation:Online ml River Evaluate Progressive Val Score

Knowledge Sources	River River Docs Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation
Domains	Online_Learning Evaluation Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete tool for evaluating online learning models using the progressive validation (test-then-train) protocol, returning a single metric result after processing the entire dataset.

Description

The evaluate.progressive_val_score function is the canonical way to evaluate an online learning model in River. It implements the predict-then-learn protocol: for each observation in the dataset, the model first makes a prediction, the metric is updated with the prediction and the true target, and then the model is trained on the observation. This ensures that the model is always evaluated on unseen data.

Under the hood, the function is implemented on top of evaluate.iter_progressive_val_score. It consumes the entire generator, optionally printing intermediate results at intervals specified by print_every, and returns the final metric object.

Key capabilities:

Delayed feedback: The moment and delay parameters simulate real-world scenarios where labels arrive after predictions. When specified, stream.simulate_qa is used to reorder observations into a question-answer sequence.
Progress printing: Setting print_every=N prints the metric state every N observations. The show_time and show_memory flags add elapsed time and memory usage to the output.
Active learning support: When the model is an active learner, the function tracks how many labels were actually used for training.
Automatic prediction method selection: The function automatically chooses between predict_one, predict_proba_one, and score_one depending on the model type and metric requirements.

Usage

Import this function when you need to:

Evaluate an online model on a streaming dataset and obtain a single metric result.
Get the standard, canonical evaluation for an online learning experiment.
Monitor progress during evaluation with periodic printing.
Simulate delayed feedback scenarios.

Code Reference

Source Location

File	Lines
`river/evaluate/progressive_validation.py`	L231-L409

Signature

def progressive_val_score(
    dataset: base.typing.Dataset,
    model,
    metric: metrics.base.Metric,
    moment: str | typing.Callable | None = None,
    delay: str | int | dt.timedelta | typing.Callable | None = None,
    print_every=0,
    show_time=False,
    show_memory=False,
    **print_kwargs,
) -> metrics.base.Metric

Import

from river import evaluate

result = evaluate.progressive_val_score(dataset, model, metric)

I/O Contract

Inputs

Parameter	Type	Default	Description
`dataset`	`base.typing.Dataset`	(required)	Iterable stream of `(x, y)` tuples or `(x, y, kwargs)` tuples.
`model`	Estimator	(required)	The model to evaluate. Must support `learn_one` and at least one prediction method.
`metric`	`metrics.base.Metric`	(required)	The metric used to evaluate predictions. Updated in-place.
`moment`	Callable \| None	`None`	Attribute or function for measuring time (for delayed feedback). If `None`, observations are processed in order.
`delay`	int \| timedelta \| Callable \| None	`None`	Amount to wait before revealing labels. If `None`, no delay (standard progressive validation).
`print_every`	`int`	`0`	Print metric state every N observations. `0` disables printing.
`show_time`	`bool`	`False`	Whether to display elapsed time in progress output.
`show_memory`	`bool`	`False`	Whether to display model memory usage in progress output.
`**print_kwargs`			Additional keyword arguments passed to Python's `print` function (e.g., `file=f` for file output).

Outputs

Output	Type	Description
Return value	`metrics.base.Metric`	The metric object, updated with all observations from the dataset. Call `metric.get()` for the numeric value or `str(metric)` for formatted output.

Usage Examples

Basic progressive validation:

from river import datasets, evaluate, linear_model, metrics, preprocessing

dataset = datasets.Phishing()
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.Accuracy()

result = evaluate.progressive_val_score(dataset, model, metric)
print(result)
# Accuracy: 88.96%

With progress printing:

from river import datasets, evaluate, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()

evaluate.progressive_val_score(
    model=model,
    dataset=datasets.Phishing(),
    metric=metrics.ROCAUC(),
    print_every=200,
)
# [200] ROCAUC: 90.20%
# [400] ROCAUC: 92.25%
# [600] ROCAUC: 93.23%
# [800] ROCAUC: 94.05%
# [1,000] ROCAUC: 94.79%
# [1,200] ROCAUC: 95.07%
# [1,250] ROCAUC: 95.07%
# ROCAUC: 95.07%

Equivalent manual loop:

from river import datasets, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.ROCAUC()

for x, y in datasets.Phishing():
    y_pred = model.predict_proba_one(x)
    metric.update(y, y_pred)
    model.learn_one(x, y)

print(metric)
# ROCAUC: 95.07%

Logging progress to a file:

from river import datasets, evaluate, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()

with open('progress.log', 'w') as f:
    evaluate.progressive_val_score(
        model=model,
        dataset=datasets.Phishing(),
        metric=metrics.ROCAUC(),
        print_every=200,
        file=f,
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment