Principle:Online ml River Online Decision Trees
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Online decision trees are incremental tree-learning algorithms that grow a decision tree one example at a time, without ever needing to re-examine past data. The foundational algorithm is the Hoeffding tree (also called Very Fast Decision Tree, VFDT), which uses a statistical bound to decide when enough evidence has been accumulated at a leaf to justify a split. Numerous extensions add concept drift adaptation, regression support, random forests, and alternative split criteria.
Theoretical Basis
Hoeffding Bound
The Hoeffding bound states that for n independent observations of a random variable with range R, the true mean differs from the observed mean by at most:
epsilon = sqrt(R^2 * ln(1/delta) / (2n))
with probability 1 - delta. In the Hoeffding tree, this bound is applied to the difference in split quality (e.g., information gain) between the best and second-best candidate attributes. A split is made when this difference exceeds epsilon, guaranteeing with high probability that the chosen attribute is the same one a batch learner would choose given infinite data.
Hoeffding Tree (VFDT)
The algorithm maintains sufficient statistics at each leaf for computing split criteria (e.g., class counts for classification, running mean/variance for regression). When a new example arrives:
- Route it to the appropriate leaf.
- Update the sufficient statistics.
- Periodically evaluate whether the Hoeffding bound justifies a split.
- If so, replace the leaf with an internal node and two new leaves.
Concept Drift Adaptations
- Hoeffding Adaptive Tree (HAT): Attaches ADWIN drift detectors to each node. When drift is detected, the subtree rooted at that node is replaced with an alternate subtree that was grown in parallel on recent data.
- Extremely Fast Decision Tree (EFDT): Splits as soon as there is any evidence that a split is beneficial, then revisits and potentially re-splits later. This trades optimality for faster tree growth.
- LAST Classifier: Focuses on the most recent data for split decisions, improving adaptation to recent drift.
Regression Variants
- Hoeffding Tree Regressor: Replaces class counts with running mean/variance and uses variance reduction as the split criterion.
- iSOUP-Tree Regressor: Extends Hoeffding regression trees for structured output prediction.
Mondrian Trees
Mondrian trees use a different splitting mechanism based on Mondrian processes (random recursive partitions). They provide well-calibrated uncertainty estimates and naturally support online updates by extending the Mondrian process with each new observation.
Stochastic Gradient Trees (SGT)
SGTs combine gradient-boosting ideas with decision trees. Each leaf maintains a gradient-based model, and splits are evaluated based on gradient statistics rather than information gain.
Complexity
- Per-example update: O(tree depth) to route plus O(attributes) to update statistics.
- Memory: O(leaves * attributes * classes) for sufficient statistics.
- Split decision: O(attributes) to evaluate candidates.
Related Pages
- Implementation:Online_ml_River_Tree_Base_Nodes
- Implementation:Online_ml_River_Tree_ExtremelyFastDecisionTree
- Implementation:Online_ml_River_Tree_HoeffdingAdaptiveTreeRegressor
- Implementation:Online_ml_River_Tree_HoeffdingTree
- Implementation:Online_ml_River_Tree_HoeffdingTreeRegressor
- Implementation:Online_ml_River_Tree_LASTClassifier
- Implementation:Online_ml_River_Tree_MondrianTree
- Implementation:Online_ml_River_Tree_MondrianTreeClassifier
- Implementation:Online_ml_River_Tree_MondrianTreeNodes
- Implementation:Online_ml_River_Tree_MondrianTreeRegressor
- Implementation:Online_ml_River_Tree_StochasticGradientTree
- Implementation:Online_ml_River_Tree_iSOUPTreeRegressor
- Implementation:Online_ml_River_Tree_Viz
- Implementation:Online_ml_River_Tree_Utils
- Implementation:Online_ml_River_Tree_SGT_Losses
- Implementation:Online_ml_River_Tree_VarianceRatioSplitCriterion