Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Online ml River Tree Node Architecture

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees
Last Updated 2026-02-08 18:00 GMT

Overview

Tree node architecture defines the internal data structures that compose an online decision tree. Each node type encapsulates specific responsibilities: leaves accumulate sufficient statistics and make predictions, while branch (internal) nodes route instances to the appropriate child based on a split condition. The design of node types directly determines the tree's memory footprint, update speed, and predictive capabilities.

Different online tree algorithms require specialized node types to support their particular learning mechanisms (e.g., drift detection, alternate subtrees, gradient-based models).

Theoretical Basis

Leaf Nodes

Leaf nodes are the terminal nodes where predictions are made. They maintain:

  • Sufficient statistics: Class distribution (classification) or running mean/variance (regression) for the examples routed to this leaf.
  • Per-attribute statistics: For each candidate split attribute, the leaf tracks the statistics needed to evaluate splits (e.g., class counts per attribute value for nominal features, Gaussian estimators for numeric features).
  • Example counter: The number of examples seen since the last split evaluation, used to determine when to re-evaluate.
class Leaf:
    stats: ClassDistribution or RunningMoments
    splitters: dict[attribute -> AttributeObserver]
    n_since_last_eval: int

Branch Nodes

Branch nodes route an instance to one of their children based on a split condition:

  • Numeric splits: Threshold-based: go left if x_j <= threshold, right otherwise.
  • Nominal splits: One child per category value, or binary grouping.
class Branch:
    split_attribute: int
    split_value: threshold or set
    children: dict[outcome -> Node]

Algorithm-Specific Node Types

  • HTC nodes (Hoeffding Tree Classifier): Standard leaf and branch nodes with class count statistics.
  • HTR nodes (Hoeffding Tree Regressor): Leaves maintain running mean, variance, and per-attribute regression statistics.
  • HATC nodes (Hoeffding Adaptive Tree Classifier): Augmented with ADWIN drift detectors and alternate subtree pointers for drift adaptation.
  • HATR nodes (Hoeffding Adaptive Tree Regressor): Regression variant with drift detection.
  • EFDTC nodes (Extremely Fast Decision Tree): Support re-splitting, where an existing internal node can change its split attribute.
  • SGT nodes (Stochastic Gradient Tree): Leaves contain gradient-based models and track gradient statistics for split evaluation.

Prediction at Leaves

Different strategies for leaf prediction include:

  • Majority class: Return the most frequent class in the leaf's distribution.
  • Naive Bayes: Use per-attribute class-conditional statistics for a Naive Bayes prediction at the leaf.
  • Adaptive (NB or MC): Monitor which strategy has lower error and switch dynamically.
  • Model-based: Fit a local model (e.g., perceptron, linear regression) at each leaf.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment