Principle:Online ml River Tree Node Architecture
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Tree node architecture defines the internal data structures that compose an online decision tree. Each node type encapsulates specific responsibilities: leaves accumulate sufficient statistics and make predictions, while branch (internal) nodes route instances to the appropriate child based on a split condition. The design of node types directly determines the tree's memory footprint, update speed, and predictive capabilities.
Different online tree algorithms require specialized node types to support their particular learning mechanisms (e.g., drift detection, alternate subtrees, gradient-based models).
Theoretical Basis
Leaf Nodes
Leaf nodes are the terminal nodes where predictions are made. They maintain:
- Sufficient statistics: Class distribution (classification) or running mean/variance (regression) for the examples routed to this leaf.
- Per-attribute statistics: For each candidate split attribute, the leaf tracks the statistics needed to evaluate splits (e.g., class counts per attribute value for nominal features, Gaussian estimators for numeric features).
- Example counter: The number of examples seen since the last split evaluation, used to determine when to re-evaluate.
class Leaf:
stats: ClassDistribution or RunningMoments
splitters: dict[attribute -> AttributeObserver]
n_since_last_eval: int
Branch Nodes
Branch nodes route an instance to one of their children based on a split condition:
- Numeric splits: Threshold-based: go left if x_j <= threshold, right otherwise.
- Nominal splits: One child per category value, or binary grouping.
class Branch:
split_attribute: int
split_value: threshold or set
children: dict[outcome -> Node]
Algorithm-Specific Node Types
- HTC nodes (Hoeffding Tree Classifier): Standard leaf and branch nodes with class count statistics.
- HTR nodes (Hoeffding Tree Regressor): Leaves maintain running mean, variance, and per-attribute regression statistics.
- HATC nodes (Hoeffding Adaptive Tree Classifier): Augmented with ADWIN drift detectors and alternate subtree pointers for drift adaptation.
- HATR nodes (Hoeffding Adaptive Tree Regressor): Regression variant with drift detection.
- EFDTC nodes (Extremely Fast Decision Tree): Support re-splitting, where an existing internal node can change its split attribute.
- SGT nodes (Stochastic Gradient Tree): Leaves contain gradient-based models and track gradient statistics for split evaluation.
Prediction at Leaves
Different strategies for leaf prediction include:
- Majority class: Return the most frequent class in the leaf's distribution.
- Naive Bayes: Use per-attribute class-conditional statistics for a Naive Bayes prediction at the leaf.
- Adaptive (NB or MC): Monitor which strategy has lower error and switch dynamically.
- Model-based: Fit a local model (e.g., perceptron, linear regression) at each leaf.
Related Pages
- Implementation:Online_ml_River_Tree_Nodes_Branch
- Implementation:Online_ml_River_Tree_Nodes_EFDTC
- Implementation:Online_ml_River_Tree_Nodes_HATC
- Implementation:Online_ml_River_Tree_Nodes_HATR
- Implementation:Online_ml_River_Tree_Nodes_HTC
- Implementation:Online_ml_River_Tree_Nodes_HTLeaf
- Implementation:Online_ml_River_Tree_Nodes_HTR
- Implementation:Online_ml_River_Tree_Nodes_SGT