Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree HoeffdingTreeRegressor

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Regression
Last Updated 2026-02-08 16:00 GMT

Overview

Hoeffding Tree Regressor (HTR) is an incremental decision tree for regression tasks that uses the Hoeffding bound to make split decisions based on variance reduction. It fits linear models or uses target means at leaf nodes to provide predictions.

Description

HTR adapts the Hoeffding Tree algorithm to regression by using variance reduction as the split criterion instead of information gain. The algorithm monitors the variance of target values at each leaf and attempts splits when sufficient samples have been observed. The Hoeffding bound ensures that split decisions are made with high confidence despite using only partial data.

Key features:

  • Variance reduction-based split criterion
  • Three leaf prediction strategies: mean, model, and adaptive
  • Linear models at leaves for improved prediction accuracy
  • Adaptive strategy that dynamically chooses between mean and model
  • Support for nominal and numeric features
  • Memory-efficient design with configurable size limits

The adaptive leaf prediction strategy monitors the squared errors of both mean and model predictions using exponential smoothing, automatically selecting the better performer for each prediction.

Usage

from river import datasets
from river import evaluate
from river import metrics
from river import tree
from river import preprocessing

dataset = datasets.TrumpApproval()

model = (
    preprocessing.StandardScaler() |
    tree.HoeffdingTreeRegressor(
        grace_period=100,
        model_selector_decay=0.9
    )
)

metric = metrics.MAE()

evaluate.progressive_val_score(dataset, model, metric)
# MAE: 0.793345

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/hoeffding_tree_regressor.py

Signature:

class HoeffdingTreeRegressor(HoeffdingTree, base.Regressor):
    def __init__(
        self,
        grace_period: int = 200,
        max_depth: int | None = None,
        delta: float = 1e-7,
        tau: float = 0.05,
        leaf_prediction: str = "adaptive",
        leaf_model: base.Regressor | None = None,
        model_selector_decay: float = 0.95,
        nominal_attributes: list | None = None,
        splitter: Splitter | None = None,
        min_samples_split: int = 5,
        binary_split: bool = False,
        max_size: float = 500.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
    )

Import:

from river.tree import HoeffdingTreeRegressor

I/O Contract

Input:

  • x (dict): Feature dictionary with attribute names as keys
  • y (float): Target regression value
  • w (float, optional): Sample weight (default: 1.0)

Output:

  • predict_one(x): Predicted regression value (float)

Key Parameters

  • grace_period (int): Number of instances between split attempts
  • delta (float): Significance level for Hoeffding bound (1-δ is confidence)
  • tau (float): Tie-breaking threshold
  • leaf_prediction (str): Prediction strategy ('mean', 'model', 'adaptive')
  • leaf_model (Regressor): Model for leaf predictions (default: LinearRegression)
  • model_selector_decay (float): Exponential decay for adaptive strategy (0-1)
  • nominal_attributes (list): List of nominal attribute identifiers
  • splitter (Splitter): Attribute observer (default: TEBSTSplitter)
  • min_samples_split (int): Minimum samples per branch after split
  • max_depth (int): Maximum tree depth

Implementation Details

Key Methods:

  • learn_one(x, y, w=1.0): Train on one instance
  • predict_one(x): Predict target value
  • _new_leaf(initial_stats, parent): Create appropriate leaf node type
  • _attempt_to_split(leaf, parent, parent_branch): Evaluate and execute split
  • _new_split_criterion(): Create VarianceReductionSplitCriterion instance

Node Types:

  • LeafMean: Leaf node predicting target mean
  • LeafModel: Leaf node using linear regression model
  • LeafAdaptive: Leaf node adaptively choosing between mean and model

Split Criterion:

Uses VarianceReductionSplitCriterion which computes merit as:

  • Pre-split variance - weighted post-split variance
  • Higher merit indicates better splits
  • Minimum samples per branch enforced via min_samples_split

Attribute Observer:

Default TEBSTSplitter (Tree E-BST Splitter):

  • Maintains binary search tree for efficient quantile computation
  • Tracks sufficient statistics for variance calculations
  • Supports both numeric and nominal features

Learning Algorithm

Learning Process:

1. Route instance to appropriate leaf via tree traversal 2. Update leaf statistics and model (if applicable) 3. Check if grace period elapsed since last split attempt 4. If yes, evaluate split candidates:

  * Compute variance reduction for each feature
  * Select top two candidates
  * Calculate Hoeffding bound
  * Split if merit difference exceeds bound or tau threshold

5. Handle missing features by following most common path 6. Manage memory periodically

Split Decision:

Split occurs when:

  • merit(best) / merit(second) < 1 - hoeffding_bound, OR
  • hoeffding_bound < tau (tie-breaking)

Poor Attribute Removal:

When enabled, attributes with merit significantly below the best are disabled to reduce memory usage.

Related Pages

References

The algorithm adapts Hoeffding Tree principles to regression tasks using variance reduction as the split criterion.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment