Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree HoeffdingTree

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Abstract_Base_Class
Last Updated 2026-02-08 16:00 GMT

Overview

HoeffdingTree is an abstract base class that defines the core operations and properties shared by all Hoeffding decision tree implementations in River. It provides memory management, tree navigation, and fundamental decision tree operations without implementing specific learning algorithms.

Description

This abstract base class establishes the foundation for incremental decision trees based on the Hoeffding bound principle. The Hoeffding bound (ε = √((R²ln(1/δ))/(2n))) determines how many samples are necessary before making confident split decisions, where R is the range, δ is the significance level, and n is the sample count.

Key responsibilities:

  • Memory management with configurable size limits
  • Tree structure maintenance (nodes, branches, leaves)
  • Navigation and traversal operations
  • Size estimation and enforcement
  • Visualization support (DataFrame export, graphviz drawing, HTML representation)
  • Debugging utilities

The class implements sophisticated memory management based on Kirkby's dissertation, tracking active and inactive leaves separately, estimating their sizes, and deactivating leaves when memory limits are approached.

Usage

This is an abstract class and cannot be instantiated directly. Concrete implementations include:

from river import tree

# Use concrete implementations
model = tree.HoeffdingTreeClassifier()
model = tree.HoeffdingTreeRegressor()
model = tree.HoeffdingAdaptiveTreeClassifier()

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/hoeffding_tree.py

Signature:

class HoeffdingTree(ABC):
    def __init__(
        self,
        max_depth: int | None = None,
        binary_split: bool = False,
        max_size: float = 100.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
    )

Import:

from river.tree.hoeffding_tree import HoeffdingTree

I/O Contract

Abstract Methods (must be implemented by subclasses):

  • _new_leaf(initial_stats, parent): Create a new leaf node
  • split_criterion (property): Define the split criterion
  • leaf_prediction (property): Define the leaf prediction strategy

Key Parameters

  • max_depth (int): Maximum tree depth (None uses system recursion limit - 20)
  • binary_split (bool): Restrict to binary splits only
  • max_size (float): Maximum tree size in mebibytes (MiB)
  • memory_estimate_period (int): Interval between memory checks
  • stop_mem_management (bool): Stop growing when memory limit is hit
  • remove_poor_attrs (bool): Disable poor attributes to reduce memory
  • merit_preprune (bool): Enable merit-based tree pre-pruning

Implementation Details

Key Methods:

  • _hoeffding_bound(range_val, confidence, n): Compute Hoeffding bound
  • _enforce_size_limit(): Deactivate leaves to maintain size constraints
  • _estimate_model_size(): Calculate current model size and trigger enforcement
  • _find_leaves(): Return list of all leaf nodes
  • _branch_selector(numerical_feature, multiway_split): Select branch node type
  • debug_one(x): Generate explanation of prediction path
  • draw(max_depth): Create graphviz visualization
  • to_dataframe(): Export tree structure to pandas DataFrame

Properties:

  • height: Distance to deepest leaf
  • n_nodes: Total node count
  • n_branches: Branch node count
  • n_leaves: Leaf node count
  • n_active_leaves: Active leaf count
  • n_inactive_leaves: Inactive leaf count
  • max_size: Maximum allowed size (MiB)
  • summary: Dictionary of tree metrics

Branch Selection:

Returns appropriate branch type based on feature type and split mode:

  • NumericBinaryBranch: Binary split on numeric feature
  • NumericMultiwayBranch: Multiway split on numeric feature
  • NominalBinaryBranch: Binary split on nominal feature
  • NominalMultiwayBranch: Multiway split on nominal feature

Memory Management

The tree implements a sophisticated memory management system:

1. Size Estimation: Periodically calculates actual vs. estimated tree size 2. Overhead Tracking: Maintains size_estimate_overhead_fraction to correct estimates 3. Leaf Prioritization: Sorts leaves by promise (potential for further splits) 4. Adaptive Deactivation: Deactivates least promising leaves when approaching limits 5. Reactivation: Reactivates leaves when memory becomes available

Visualization

DataFrame Export:

df = model.to_dataframe()
# Returns pandas DataFrame with columns: node, parent, is_leaf, depth, and node attributes

Graphviz Drawing:

dot = model.draw(max_depth=5)
# Returns graphviz.Digraph object for visualization

Debug Path:

explanation = model.debug_one(x)
# Returns string showing path from root to prediction

Related Pages

References

Kirkby, R.B., 2007. "Improving hoeffding trees." Doctoral dissertation, The University of Waikato.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment