Principle:Online ml River Streaming Classification Metrics

Knowledge Sources	Domains	Last Updated
Machine Learning Statistics	Online_Learning, Evaluation, Classification	2026-02-08 18:00 GMT

Overview

Streaming classification metrics are evaluation measures that are computed incrementally as predictions and ground truth labels arrive one at a time. Unlike batch metrics that require the full set of predictions, streaming metrics maintain sufficient statistics that are updated with each observation, allowing anytime querying of the current metric value.

Description

Evaluating classification models in the online setting requires metrics that can be updated incrementally. Each streaming metric maintains a compact internal state (typically a confusion matrix or running counts) and updates this state with each new (prediction, ground truth) pair.

Key families of streaming classification metrics include:

Confusion matrix-derived metrics: The streaming confusion matrix is the foundational structure from which many metrics are derived. It maintains counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for each class, updated with each observation. From this, one can compute:

Precision: TP / (TP + FP) -- the fraction of positive predictions that are correct.
Recall: TP / (TP + FN) -- the fraction of actual positives that are correctly identified.
F-beta score: A weighted harmonic mean of precision and recall, with parameter $β$ controlling the emphasis on recall vs. precision.
Balanced Accuracy: The average of per-class recall values, correcting for class imbalance.
Matthews Correlation Coefficient (MCC): A balanced measure that accounts for all four quadrants of the confusion matrix, producing a value in [-1, +1].

Agreement and correlation metrics:

Cohen's Kappa: Measures agreement between predictions and ground truth, corrected for agreement expected by chance.
Jaccard Index: The intersection over union of predicted and true positive sets.

Information-theoretic metrics:

Cross-entropy / Log Loss: Measures the divergence between the predicted probability distribution and the true label distribution.
Mutual Information: Quantifies the information shared between predicted and true labels.
V-beta score: A normalized mutual information measure that balances homogeneity and completeness.

Geometric and specialized metrics:

Geometric Mean: The geometric mean of per-class recall values, especially useful for imbalanced problems.
Fowlkes-Mallows Index: The geometric mean of precision and recall.
Rolling ROC AUC: A windowed version of ROC AUC computed over the most recent observations.

Usage

Use streaming classification metrics when:

You are evaluating an online classifier in a test-then-train loop.
You need anytime access to the current evaluation score.
You want to monitor model performance over time and detect degradation.
You need to compare multiple online models on the same data stream.

Theoretical Basis

Incremental Confusion Matrix

Initialize: CM[i][j] = 0 for all class pairs (i, j)

update(y_true, y_pred):
    CM[y_true][y_pred] += 1

Derived statistics (per class c):
    TP_c = CM[c][c]
    FP_c = sum_j CM[j][c] - CM[c][c]
    FN_c = sum_j CM[c][j] - CM[c][c]
    TN_c = total - TP_c - FP_c - FN_c

Key Metric Formulas

F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)

Cohen's Kappa = (p_o - p_e) / (1 - p_e)
    where p_o = observed agreement, p_e = expected agreement by chance

MCC = (TP*TN - FP*FN) / sqrt((TP+FP)(TP+FN)(TN+FP)(TN+FN))

Cross-Entropy = -(1/N) * sum_i sum_c y_{i,c} * log(p_{i,c})

Balanced Accuracy = (1/K) * sum_c Recall_c

All these formulas operate on counts that are maintained incrementally. The metric value can be recomputed in O(K) time (where K is the number of classes) after each update.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment