Principle:Online ml River Streaming Accuracy Measurement
| Knowledge Sources | River River Docs |
|---|---|
| Domains | Online_Learning Evaluation Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Streaming accuracy measurement is the incremental computation of classification accuracy as the ratio of correct predictions to total predictions, updated one observation at a time via a streaming confusion matrix.
Description
Accuracy is the most intuitive classification metric: it measures the fraction of predictions that exactly match the true labels. In batch machine learning, accuracy is computed after all predictions are made. In online (streaming) machine learning, accuracy must be computed incrementally, updating the running score as each new prediction-label pair arrives.
River implements streaming accuracy through an incrementally updated confusion matrix. Rather than storing all predictions and labels, the confusion matrix maintains running counts of true positives, true negatives, false positives, and false negatives for each class. When a new prediction-label pair arrives, only the relevant cell of the confusion matrix is incremented. The accuracy is then computed on demand by dividing the total number of correct predictions (sum of the diagonal) by the total weight of all observations.
This approach has several advantages:
- O(1) memory per class: only the confusion matrix cells are stored.
- O(1) update time: each new observation updates a single cell.
- O(k) query time: computing accuracy requires summing the diagonal (k classes), but for binary classification this is O(1).
- Support for weighted observations: the confusion matrix can accumulate weights rather than counts, enabling importance-weighted accuracy.
Usage
Use streaming accuracy measurement when:
- You need a simple, interpretable metric for classification performance in an online learning setting.
- You are using
evaluate.progressive_val_scoreto evaluate a model and need to pass a metric object. - Class distributions are roughly balanced (accuracy can be misleading for imbalanced datasets).
- You want to monitor model performance in real time as data arrives.
Theoretical Basis
Definition:
Accuracy = total_true_positives / total_weight
Where:
total_true_positivesis the sum of the diagonal of the confusion matrix, i.e., the total weight of all correctly classified observations across all classes.total_weightis the total weight of all observations processed so far.
Incremental update: When a new observation arrives:
confusion_matrix[y_true][y_pred] += w
The accuracy is not recomputed from scratch; instead, the get() method reads directly from the confusion matrix's cached totals.
For binary classification:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Where TP = true positives, TN = true negatives, FP = false positives, FN = false negatives.
Relationship to error rate:
Error Rate = 1 - Accuracy
Limitation: Accuracy treats all misclassifications equally and can be misleading for imbalanced datasets. For example, a dataset with 95% negative samples achieves 95% accuracy with a trivial "always predict negative" classifier. In such cases, metrics like ROCAUC, F1-score, or balanced accuracy are more informative.