Implementation:Online ml River Tree HoeffdingTree
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Abstract_Base_Class |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
HoeffdingTree is an abstract base class that defines the core operations and properties shared by all Hoeffding decision tree implementations in River. It provides memory management, tree navigation, and fundamental decision tree operations without implementing specific learning algorithms.
Description
This abstract base class establishes the foundation for incremental decision trees based on the Hoeffding bound principle. The Hoeffding bound (ε = √((R²ln(1/δ))/(2n))) determines how many samples are necessary before making confident split decisions, where R is the range, δ is the significance level, and n is the sample count.
Key responsibilities:
- Memory management with configurable size limits
- Tree structure maintenance (nodes, branches, leaves)
- Navigation and traversal operations
- Size estimation and enforcement
- Visualization support (DataFrame export, graphviz drawing, HTML representation)
- Debugging utilities
The class implements sophisticated memory management based on Kirkby's dissertation, tracking active and inactive leaves separately, estimating their sizes, and deactivating leaves when memory limits are approached.
Usage
This is an abstract class and cannot be instantiated directly. Concrete implementations include:
from river import tree
# Use concrete implementations
model = tree.HoeffdingTreeClassifier()
model = tree.HoeffdingTreeRegressor()
model = tree.HoeffdingAdaptiveTreeClassifier()
Code Reference
Source Location:
/tmp/kapso_repo_178qi9vb/river/tree/hoeffding_tree.py
Signature:
class HoeffdingTree(ABC):
def __init__(
self,
max_depth: int | None = None,
binary_split: bool = False,
max_size: float = 100.0,
memory_estimate_period: int = 1000000,
stop_mem_management: bool = False,
remove_poor_attrs: bool = False,
merit_preprune: bool = True,
)
Import:
from river.tree.hoeffding_tree import HoeffdingTree
I/O Contract
Abstract Methods (must be implemented by subclasses):
- _new_leaf(initial_stats, parent): Create a new leaf node
- split_criterion (property): Define the split criterion
- leaf_prediction (property): Define the leaf prediction strategy
Key Parameters
- max_depth (int): Maximum tree depth (None uses system recursion limit - 20)
- binary_split (bool): Restrict to binary splits only
- max_size (float): Maximum tree size in mebibytes (MiB)
- memory_estimate_period (int): Interval between memory checks
- stop_mem_management (bool): Stop growing when memory limit is hit
- remove_poor_attrs (bool): Disable poor attributes to reduce memory
- merit_preprune (bool): Enable merit-based tree pre-pruning
Implementation Details
Key Methods:
- _hoeffding_bound(range_val, confidence, n): Compute Hoeffding bound
- _enforce_size_limit(): Deactivate leaves to maintain size constraints
- _estimate_model_size(): Calculate current model size and trigger enforcement
- _find_leaves(): Return list of all leaf nodes
- _branch_selector(numerical_feature, multiway_split): Select branch node type
- debug_one(x): Generate explanation of prediction path
- draw(max_depth): Create graphviz visualization
- to_dataframe(): Export tree structure to pandas DataFrame
Properties:
- height: Distance to deepest leaf
- n_nodes: Total node count
- n_branches: Branch node count
- n_leaves: Leaf node count
- n_active_leaves: Active leaf count
- n_inactive_leaves: Inactive leaf count
- max_size: Maximum allowed size (MiB)
- summary: Dictionary of tree metrics
Branch Selection:
Returns appropriate branch type based on feature type and split mode:
- NumericBinaryBranch: Binary split on numeric feature
- NumericMultiwayBranch: Multiway split on numeric feature
- NominalBinaryBranch: Binary split on nominal feature
- NominalMultiwayBranch: Multiway split on nominal feature
Memory Management
The tree implements a sophisticated memory management system:
1. Size Estimation: Periodically calculates actual vs. estimated tree size 2. Overhead Tracking: Maintains size_estimate_overhead_fraction to correct estimates 3. Leaf Prioritization: Sorts leaves by promise (potential for further splits) 4. Adaptive Deactivation: Deactivates least promising leaves when approaching limits 5. Reactivation: Reactivates leaves when memory becomes available
Visualization
DataFrame Export:
df = model.to_dataframe()
# Returns pandas DataFrame with columns: node, parent, is_leaf, depth, and node attributes
Graphviz Drawing:
dot = model.draw(max_depth=5)
# Returns graphviz.Digraph object for visualization
Debug Path:
explanation = model.debug_one(x)
# Returns string showing path from root to prediction
Related Pages
- Online_ml_River_Tree_HoeffdingTreeClassifier
- Online_ml_River_Tree_HoeffdingTreeRegressor
- Online_ml_River_Tree_ExtremelyFastDecisionTree
- Online_ml_River_Tree_Base_Nodes
- Online_ml_River_Tree_Utils
- Online_ml_River_Tree_Viz
References
Kirkby, R.B., 2007. "Improving hoeffding trees." Doctoral dissertation, The University of Waikato.