Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree HoeffdingAdaptiveTreeRegressor

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Regression, Concept_Drift
Last Updated 2026-02-08 16:00 GMT

Overview

Hoeffding Adaptive Tree Regressor (HATR) is a regression version of the Hoeffding Adaptive Tree that uses ADWIN drift detectors at each node to monitor concept drift. When drift is detected, alternate subtrees are grown in the background and swapped when they demonstrate superior performance.

Description

HATR extends the standard Hoeffding Tree Regressor with adaptive mechanisms to handle non-stationary data streams. Each decision node maintains an ADWIN drift detector that monitors prediction errors. When drift is detected at a node, an alternate tree begins growing in parallel. The algorithm periodically evaluates whether the alternate tree significantly outperforms the current subtree using statistical tests, and swaps them when appropriate.

Key features:

  • ADWIN-based drift detection at each node
  • Background growth of alternate trees
  • Statistical significance testing for tree replacement
  • Bootstrap sampling support for improved performance
  • Error normalization based on empirical Gaussian distribution assumptions

The error normalization strategy assumes prediction deviations follow a normal distribution, applying min-max normalization in the range [-3σ, 3σ] before feeding errors to ADWIN detectors.

Usage

from river import datasets
from river import evaluate
from river import metrics
from river import tree
from river import preprocessing

dataset = datasets.TrumpApproval()

model = (
    preprocessing.StandardScaler() |
    tree.HoeffdingAdaptiveTreeRegressor(
        grace_period=50,
        model_selector_decay=0.3,
        seed=0
    )
)

metric = metrics.MAE()

evaluate.progressive_val_score(dataset, model, metric)
# MAE: 0.917576

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/hoeffding_adaptive_tree_regressor.py

Signature:

class HoeffdingAdaptiveTreeRegressor(HoeffdingTreeRegressor):
    def __init__(
        self,
        grace_period: int = 200,
        max_depth: int | None = None,
        delta: float = 1e-7,
        tau: float = 0.05,
        leaf_prediction: str = "adaptive",
        leaf_model: base.Regressor | None = None,
        model_selector_decay: float = 0.95,
        nominal_attributes: list | None = None,
        splitter: Splitter | None = None,
        min_samples_split: int = 5,
        bootstrap_sampling: bool = True,
        drift_window_threshold: int = 300,
        drift_detector: base.DriftDetector | None = None,
        switch_significance: float = 0.05,
        binary_split: bool = False,
        max_size: float = 500.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
        seed: int | None = None,
    )

Import:

from river.tree import HoeffdingAdaptiveTreeRegressor

I/O Contract

Input:

  • x (dict): Feature dictionary with attribute names as keys
  • y (float): Target regression value
  • w (float, optional): Sample weight (default: 1.0)

Output:

  • predict_one(x): Predicted regression value (float)

Key Parameters

  • grace_period (int): Number of instances between split attempts
  • leaf_prediction (str): Prediction mechanism ('mean', 'model', 'adaptive')
  • leaf_model (Regressor): Base model for leaf predictions (default: LinearRegression)
  • model_selector_decay (float): Exponential decay for model selection (0-1)
  • bootstrap_sampling (bool): Enable bootstrap sampling in leaves
  • drift_window_threshold (int): Minimum observations for alternate tree consideration
  • drift_detector (DriftDetector): Drift detection algorithm (default: ADWIN)
  • switch_significance (float): Significance level for subtree replacement tests
  • seed (int): Random seed for reproducibility

Implementation Details

Key Methods:

  • learn_one(x, y, w=1.0): Train on one instance with drift detection
  • predict_one(x): Predict by averaging predictions from reached leaves
  • _new_leaf(initial_stats, parent, is_active): Create adaptive leaf nodes
  • _branch_selector(numerical_feature, multiway_split): Select appropriate branch type

Node Types:

  • AdaLeafRegMean: Leaf predicting target mean with drift detection
  • AdaLeafRegModel: Leaf using learned model with drift detection
  • AdaLeafRegAdaptive: Leaf adaptively choosing between mean and model
  • AdaNumBinaryBranchReg/AdaNumMultiwayBranchReg: Numeric branch nodes
  • AdaNomBinaryBranchReg/AdaNomMultiwayBranchReg: Nominal branch nodes

Properties:

  • n_alternate_trees: Number of alternate trees currently growing
  • n_pruned_alternate_trees: Count of pruned alternate trees
  • n_switch_alternate_trees: Count of successful tree replacements

Related Pages

References

Bifet, Albert, and Ricard Gavaldà. "Adaptive learning from evolving data streams." In International Symposium on Intelligent Data Analysis, pp. 249-260. Springer, Berlin, Heidelberg, 2009.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment