Implementation:Online ml River Tree HoeffdingTreeRegressor
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Regression |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Hoeffding Tree Regressor (HTR) is an incremental decision tree for regression tasks that uses the Hoeffding bound to make split decisions based on variance reduction. It fits linear models or uses target means at leaf nodes to provide predictions.
Description
HTR adapts the Hoeffding Tree algorithm to regression by using variance reduction as the split criterion instead of information gain. The algorithm monitors the variance of target values at each leaf and attempts splits when sufficient samples have been observed. The Hoeffding bound ensures that split decisions are made with high confidence despite using only partial data.
Key features:
- Variance reduction-based split criterion
- Three leaf prediction strategies: mean, model, and adaptive
- Linear models at leaves for improved prediction accuracy
- Adaptive strategy that dynamically chooses between mean and model
- Support for nominal and numeric features
- Memory-efficient design with configurable size limits
The adaptive leaf prediction strategy monitors the squared errors of both mean and model predictions using exponential smoothing, automatically selecting the better performer for each prediction.
Usage
from river import datasets
from river import evaluate
from river import metrics
from river import tree
from river import preprocessing
dataset = datasets.TrumpApproval()
model = (
preprocessing.StandardScaler() |
tree.HoeffdingTreeRegressor(
grace_period=100,
model_selector_decay=0.9
)
)
metric = metrics.MAE()
evaluate.progressive_val_score(dataset, model, metric)
# MAE: 0.793345
Code Reference
Source Location:
/tmp/kapso_repo_178qi9vb/river/tree/hoeffding_tree_regressor.py
Signature:
class HoeffdingTreeRegressor(HoeffdingTree, base.Regressor):
def __init__(
self,
grace_period: int = 200,
max_depth: int | None = None,
delta: float = 1e-7,
tau: float = 0.05,
leaf_prediction: str = "adaptive",
leaf_model: base.Regressor | None = None,
model_selector_decay: float = 0.95,
nominal_attributes: list | None = None,
splitter: Splitter | None = None,
min_samples_split: int = 5,
binary_split: bool = False,
max_size: float = 500.0,
memory_estimate_period: int = 1000000,
stop_mem_management: bool = False,
remove_poor_attrs: bool = False,
merit_preprune: bool = True,
)
Import:
from river.tree import HoeffdingTreeRegressor
I/O Contract
Input:
- x (dict): Feature dictionary with attribute names as keys
- y (float): Target regression value
- w (float, optional): Sample weight (default: 1.0)
Output:
- predict_one(x): Predicted regression value (float)
Key Parameters
- grace_period (int): Number of instances between split attempts
- delta (float): Significance level for Hoeffding bound (1-δ is confidence)
- tau (float): Tie-breaking threshold
- leaf_prediction (str): Prediction strategy ('mean', 'model', 'adaptive')
- leaf_model (Regressor): Model for leaf predictions (default: LinearRegression)
- model_selector_decay (float): Exponential decay for adaptive strategy (0-1)
- nominal_attributes (list): List of nominal attribute identifiers
- splitter (Splitter): Attribute observer (default: TEBSTSplitter)
- min_samples_split (int): Minimum samples per branch after split
- max_depth (int): Maximum tree depth
Implementation Details
Key Methods:
- learn_one(x, y, w=1.0): Train on one instance
- predict_one(x): Predict target value
- _new_leaf(initial_stats, parent): Create appropriate leaf node type
- _attempt_to_split(leaf, parent, parent_branch): Evaluate and execute split
- _new_split_criterion(): Create VarianceReductionSplitCriterion instance
Node Types:
- LeafMean: Leaf node predicting target mean
- LeafModel: Leaf node using linear regression model
- LeafAdaptive: Leaf node adaptively choosing between mean and model
Split Criterion:
Uses VarianceReductionSplitCriterion which computes merit as:
- Pre-split variance - weighted post-split variance
- Higher merit indicates better splits
- Minimum samples per branch enforced via min_samples_split
Attribute Observer:
Default TEBSTSplitter (Tree E-BST Splitter):
- Maintains binary search tree for efficient quantile computation
- Tracks sufficient statistics for variance calculations
- Supports both numeric and nominal features
Learning Algorithm
Learning Process:
1. Route instance to appropriate leaf via tree traversal 2. Update leaf statistics and model (if applicable) 3. Check if grace period elapsed since last split attempt 4. If yes, evaluate split candidates:
* Compute variance reduction for each feature * Select top two candidates * Calculate Hoeffding bound * Split if merit difference exceeds bound or tau threshold
5. Handle missing features by following most common path 6. Manage memory periodically
Split Decision:
Split occurs when:
- merit(best) / merit(second) < 1 - hoeffding_bound, OR
- hoeffding_bound < tau (tie-breaking)
Poor Attribute Removal:
When enabled, attributes with merit significantly below the best are disabled to reduce memory usage.
Related Pages
- Online_ml_River_Tree_HoeffdingTree
- Online_ml_River_Tree_HoeffdingAdaptiveTreeRegressor
- Online_ml_River_Tree_iSOUPTreeRegressor
- Online_ml_River_Tree_Base_Nodes
- Online_ml_River_Tree_Splitter
References
The algorithm adapts Hoeffding Tree principles to regression tasks using variance reduction as the split criterion.