Principle:Online ml River Multiclass Decomposition
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Classification |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Multiclass decomposition is a family of strategies that reduce a multiclass classification problem into a collection of binary classification sub-problems. Since many powerful classifiers are inherently binary (e.g., logistic regression, support vector machines), decomposition techniques allow practitioners to leverage these algorithms for problems with three or more classes without modifying the base learner.
In the online learning setting, decomposition is particularly valuable because each binary sub-learner can be updated independently and incrementally as new observations arrive, preserving the one-pass, bounded-memory guarantees of the streaming paradigm.
Theoretical Basis
One-vs-Rest (OvR)
For a problem with K classes, OvR trains K binary classifiers. Classifier k treats class k as positive and all other classes as negative. At prediction time the class whose classifier produces the highest confidence score is selected:
y_hat = argmax_k f_k(x), k = 1, ..., K
OvR requires O(K) model updates per observation and O(K) predictions.
One-vs-One (OvO)
OvO trains one binary classifier for every pair of classes, yielding K(K-1)/2 classifiers. Each classifier is trained only on examples belonging to its two classes. Prediction is performed by majority voting across all pairwise classifiers.
OvO can be more accurate when decision boundaries between specific class pairs are simpler, but the number of classifiers grows quadratically with K.
Error-Correcting Output Codes (ECOC)
ECOC assigns each class a binary codeword of length L and trains L binary classifiers, one per bit position. At prediction time the codeword produced by the ensemble is compared against all class codewords, and the class with the smallest Hamming distance is chosen. ECOC provides built-in error correction: if fewer than d/2 bits are wrong (where d is the minimum distance between codewords), the correct class is still recovered.
Trade-offs
| Strategy | Number of Classifiers | Training Cost per Example | Error Correction |
|---|---|---|---|
| One-vs-Rest | K | O(K) | None |
| One-vs-One | K(K-1)/2 | O(K) relevant pairs | None |
| Output Codes | L (configurable) | O(L) | Yes (Hamming distance) |