Principle:Online ml River Progressive Validation

Knowledge Sources	River River Docs Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation
Domains	Online_Learning Evaluation Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Progressive validation (also called prequential evaluation) is an evaluation protocol for streaming models where each observation is first used for prediction, then for metric update, then for model update, ensuring an honest assessment without data leakage.

Description

In batch machine learning, models are evaluated using hold-out sets or cross-validation. These techniques require the entire dataset to be available upfront and involve a clear separation between training and test data. In online learning, data arrives as a stream, and the model is continuously learning. This raises a fundamental question: how do you evaluate a model that is always being updated?

Progressive validation (also known as prequential evaluation or test-then-train) solves this by establishing a strict protocol for each observation:

Predict: Use the current model to predict the target for the incoming observation.
Evaluate: Compare the prediction against the true target and update the metric.
Learn: Update the model with the observation.

This ordering is critical: the model is always evaluated on data it has not yet seen. This eliminates the need for a separate test set and avoids the data leakage that would occur if the model were updated before being evaluated on the same observation.

Progressive validation also supports delayed feedback via the moment and delay parameters, which simulate real-world scenarios where the ground truth becomes available only after some time has passed. In this mode, the function uses stream.simulate_qa to reorder observations into a question-answer sequence that respects temporal ordering.

The paper "Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation" by Blum, Kalai, and Langford provides theoretical guarantees showing that progressive validation yields error estimates that are at least as reliable as hold-out evaluation, while using all available data for both training and evaluation.

Usage

Use progressive validation when:

You are evaluating an online learning model on a streaming dataset.
You want the most honest evaluation possible without needing a separate held-out test set.
You want to simulate a production deployment scenario, including delayed feedback.
You need a single function call that handles the entire predict-evaluate-learn loop.

Theoretical Basis

Protocol (no delay):

function progressive_val_score(dataset, model, metric):
    for (x, y) in dataset:
        y_pred = model.predict(x)       # Step 1: predict
        metric.update(y, y_pred)         # Step 2: evaluate
        model.learn_one(x, y)            # Step 3: learn
    return metric

Protocol (with delay):

When a delay is specified, the protocol becomes:

function progressive_val_score(dataset, model, metric, moment, delay):
    pending = {}
    for (i, x, y) in simulate_qa(dataset, moment, delay):
        if y is None:
            # Question: no ground truth yet, just predict
            pending[i] = model.predict(x)
        else:
            # Answer: ground truth available
            y_pred = pending.pop(i)
            metric.update(y, y_pred)
            model.learn_one(x, y)
    return metric

Theoretical guarantee (Blum et al.): Let $e_{i}$ be the loss on the i-th observation under progressive validation. Then with probability at least $1 - δ$ :

(1/n) * sum(e_i) <= E[loss on fresh data] + O(sqrt(log(1/delta) / n))

This means the progressive validation estimate converges to the true expected loss at a rate of $O (1 / \sqrt{n})$ , which is the same rate as hold-out evaluation.

Key properties:

No data waste: Every observation is used for both evaluation and training.
No data leakage: The model is always evaluated before learning from each observation.
Supports delayed feedback: Correctly handles scenarios where labels arrive after predictions.
Anytime evaluation: The metric can be queried at any point during the stream.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment