Principle:Online ml River Streaming ROCAUC

Knowledge Sources	River River Docs
Domains	Online_Learning Evaluation Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Streaming ROCAUC is an incremental approximation of the Area Under the Receiver Operating Characteristic Curve for streaming binary classification, computed using discretized thresholds and trapezoidal integration.

Description

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various classification thresholds. The Area Under the Curve (AUC) summarizes this curve into a single scalar that measures a classifier's ability to discriminate between positive and negative classes, regardless of the specific decision threshold chosen.

Computing the exact ROC AUC requires access to all predictions and ground truths simultaneously, which is infeasible in a streaming setting. River's streaming ROCAUC addresses this by discretizing the ROC curve: it maintains a fixed number of threshold-specific confusion matrices, each tracking the performance of the classifier at a particular probability threshold. When the AUC is requested, it computes the TPR and FPR at each threshold and integrates the resulting curve using the trapezoidal rule.

The approximation quality depends on the number of thresholds (n_thresholds). With the default of 10 thresholds, the approximation is generally sufficient for model comparison. Increasing the number of thresholds improves accuracy at the cost of additional memory and computation.

Key properties:

Threshold-independent: Unlike accuracy, ROC AUC evaluates the model's ranking quality across all possible thresholds.
Robust to class imbalance: ROC AUC is unaffected by the class distribution, making it suitable for imbalanced datasets.
Incremental computation: Each confusion matrix is updated in O(1) per observation; the AUC is computed in O(n_thresholds) on demand.

Usage

Use streaming ROCAUC when:

You need a threshold-independent metric for binary classification in a streaming setting.
The dataset is imbalanced and accuracy would be misleading.
You want to compare classifiers based on their ranking quality rather than at a fixed decision boundary.
You are using models that output probability estimates (e.g., logistic regression, Naive Bayes).

Theoretical Basis

ROC curve construction (discretized):

Given n_thresholds evenly spaced thresholds $t_{1}, t_{2}, \dots, t_{k}$ in $[0, 1]$ :

For each threshold $t_{i}$ and each new observation $(y_{true}, y_{pred})$ :

p_true = y_pred[True]  (probability of positive class)
predicted_positive = (p_true > t_i)
confusion_matrix_i.update(y_true == pos_val, predicted_positive)

Computing TPR and FPR at each threshold:

TPR_i = TP_i / (TP_i + FN_i)
FPR_i = FP_i / (FP_i + TN_i)

Where TP, TN, FP, FN are read from the i-th confusion matrix.

AUC via trapezoidal integration:

The AUC is computed as the negative of the trapezoidal integral (negative because the FPR values are in decreasing order due to threshold ordering):

AUC = -trapezoid(x=FPRs, y=TPRs)

This uses scipy.integrate.trapezoid to compute:

AUC = -sum_{i=1}^{k-1} (FPR_{i+1} - FPR_i) * (TPR_i + TPR_{i+1}) / 2

Interpretation:

AUC = 1.0: perfect classifier (all positives ranked above all negatives)
AUC = 0.5: random classifier (no discrimination)
AUC < 0.5: worse than random (predictions are inversely correlated with truth)

Approximation error: The discretization introduces an approximation error that decreases as n_thresholds increases. With well-calibrated probabilities, even a small number of thresholds (10-20) provides a reliable estimate for model comparison.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment