Principle:Online ml River Streaming Classification Metrics
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| Machine Learning Statistics | Online_Learning, Evaluation, Classification | 2026-02-08 18:00 GMT |
Overview
Streaming classification metrics are evaluation measures that are computed incrementally as predictions and ground truth labels arrive one at a time. Unlike batch metrics that require the full set of predictions, streaming metrics maintain sufficient statistics that are updated with each observation, allowing anytime querying of the current metric value.
Description
Evaluating classification models in the online setting requires metrics that can be updated incrementally. Each streaming metric maintains a compact internal state (typically a confusion matrix or running counts) and updates this state with each new (prediction, ground truth) pair.
Key families of streaming classification metrics include:
Confusion matrix-derived metrics: The streaming confusion matrix is the foundational structure from which many metrics are derived. It maintains counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for each class, updated with each observation. From this, one can compute:
- Precision: TP / (TP + FP) -- the fraction of positive predictions that are correct.
- Recall: TP / (TP + FN) -- the fraction of actual positives that are correctly identified.
- F-beta score: A weighted harmonic mean of precision and recall, with parameter controlling the emphasis on recall vs. precision.
- Balanced Accuracy: The average of per-class recall values, correcting for class imbalance.
- Matthews Correlation Coefficient (MCC): A balanced measure that accounts for all four quadrants of the confusion matrix, producing a value in [-1, +1].
Agreement and correlation metrics:
- Cohen's Kappa: Measures agreement between predictions and ground truth, corrected for agreement expected by chance.
- Jaccard Index: The intersection over union of predicted and true positive sets.
Information-theoretic metrics:
- Cross-entropy / Log Loss: Measures the divergence between the predicted probability distribution and the true label distribution.
- Mutual Information: Quantifies the information shared between predicted and true labels.
- V-beta score: A normalized mutual information measure that balances homogeneity and completeness.
Geometric and specialized metrics:
- Geometric Mean: The geometric mean of per-class recall values, especially useful for imbalanced problems.
- Fowlkes-Mallows Index: The geometric mean of precision and recall.
- Rolling ROC AUC: A windowed version of ROC AUC computed over the most recent observations.
Usage
Use streaming classification metrics when:
- You are evaluating an online classifier in a test-then-train loop.
- You need anytime access to the current evaluation score.
- You want to monitor model performance over time and detect degradation.
- You need to compare multiple online models on the same data stream.
Theoretical Basis
Incremental Confusion Matrix
Initialize: CM[i][j] = 0 for all class pairs (i, j)
update(y_true, y_pred):
CM[y_true][y_pred] += 1
Derived statistics (per class c):
TP_c = CM[c][c]
FP_c = sum_j CM[j][c] - CM[c][c]
FN_c = sum_j CM[c][j] - CM[c][c]
TN_c = total - TP_c - FP_c - FN_c
Key Metric Formulas
F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)
Cohen's Kappa = (p_o - p_e) / (1 - p_e)
where p_o = observed agreement, p_e = expected agreement by chance
MCC = (TP*TN - FP*FN) / sqrt((TP+FP)(TP+FN)(TN+FP)(TN+FN))
Cross-Entropy = -(1/N) * sum_i sum_c y_{i,c} * log(p_{i,c})
Balanced Accuracy = (1/K) * sum_c Recall_c
All these formulas operate on counts that are maintained incrementally. The metric value can be recomputed in O(K) time (where K is the number of classes) after each update.
Related Pages
- Implementation:Online_ml_River_Metrics_Base
- Implementation:Online_ml_River_Metrics_BalancedAccuracy
- Implementation:Online_ml_River_Metrics_ClassificationReport
- Implementation:Online_ml_River_Metrics_CohenKappa
- Implementation:Online_ml_River_Metrics_ConfusionMatrix
- Implementation:Online_ml_River_Metrics_CrossEntropy
- Implementation:Online_ml_River_Metrics_FBeta
- Implementation:Online_ml_River_Metrics_FowlkesMallows
- Implementation:Online_ml_River_Metrics_GeometricMean
- Implementation:Online_ml_River_Metrics_Jaccard
- Implementation:Online_ml_River_Metrics_LogLoss
- Implementation:Online_ml_River_Metrics_MCC
- Implementation:Online_ml_River_Metrics_Precision
- Implementation:Online_ml_River_Metrics_Recall
- Implementation:Online_ml_River_Metrics_RollingROCAUC
- Implementation:Online_ml_River_Metrics_VBeta
- Implementation:Online_ml_River_Metrics_MutualInfo
- Principle:Online_ml_River_Streaming_Accuracy_Measurement
- Principle:Online_ml_River_Streaming_ROCAUC