Principle:Online ml River Multiclass Decomposition

Knowledge Sources	Machine Learning Pattern Recognition and Machine Learning
Domains	Online_Learning, Classification
Last Updated	2026-02-08 18:00 GMT

Overview

Multiclass decomposition is a family of strategies that reduce a multiclass classification problem into a collection of binary classification sub-problems. Since many powerful classifiers are inherently binary (e.g., logistic regression, support vector machines), decomposition techniques allow practitioners to leverage these algorithms for problems with three or more classes without modifying the base learner.

In the online learning setting, decomposition is particularly valuable because each binary sub-learner can be updated independently and incrementally as new observations arrive, preserving the one-pass, bounded-memory guarantees of the streaming paradigm.

Theoretical Basis

One-vs-Rest (OvR)

For a problem with K classes, OvR trains K binary classifiers. Classifier k treats class k as positive and all other classes as negative. At prediction time the class whose classifier produces the highest confidence score is selected:

y_hat = argmax_k  f_k(x),   k = 1, ..., K

OvR requires O(K) model updates per observation and O(K) predictions.

One-vs-One (OvO)

OvO trains one binary classifier for every pair of classes, yielding K(K-1)/2 classifiers. Each classifier is trained only on examples belonging to its two classes. Prediction is performed by majority voting across all pairwise classifiers.

OvO can be more accurate when decision boundaries between specific class pairs are simpler, but the number of classifiers grows quadratically with K.

Error-Correcting Output Codes (ECOC)

ECOC assigns each class a binary codeword of length L and trains L binary classifiers, one per bit position. At prediction time the codeword produced by the ensemble is compared against all class codewords, and the class with the smallest Hamming distance is chosen. ECOC provides built-in error correction: if fewer than d/2 bits are wrong (where d is the minimum distance between codewords), the correct class is still recovered.

Trade-offs

Strategy	Number of Classifiers	Training Cost per Example	Error Correction
One-vs-Rest	K	O(K)	None
One-vs-One	K(K-1)/2	O(K) relevant pairs	None
Output Codes	L (configurable)	O(L)	Yes (Hamming distance)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment