Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree ExtremelyFastDecisionTree

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Classification
Last Updated 2026-02-08 16:00 GMT

Overview

The Extremely Fast Decision Tree (EFDT) classifier, also known as Hoeffding AnyTime Tree (HATT), is an incremental decision tree that continually revisits and reevaluates split decisions to converge faster to optimal tree structures. Unlike standard Hoeffding Trees, EFDT can replace existing splits when better alternatives are discovered.

Description

EFDT extends the Hoeffding Tree by implementing a split reevaluation mechanism. At each internal node, after observing a configurable number of instances (`min_samples_reevaluate`), the algorithm reassesses whether the current split is still optimal. If a better split is found or if a null split becomes preferable, the subtree is modified accordingly. This makes EFDT adaptive to stationary data streams and provides theoretical guarantees about convergence to batch decision tree structures.

The algorithm processes samples by:

  1. Sorting instances to leaves and updating statistics
  2. Processing nodes from root to leaf, reevaluating splits at internal nodes
  3. Attempting splits at leaves when grace period conditions are met
  4. Replacing splits or pruning subtrees when reevaluation suggests improvements

Key distinguishing features:

  • Split reevaluation at internal nodes
  • Subtree pruning and replacement
  • Faster convergence to optimal structure
  • Support for three leaf prediction strategies: Majority Class, Naive Bayes, and Naive Bayes Adaptive

Usage

from river.datasets import synth
from river import evaluate
from river import metrics
from river import tree

gen = synth.Agrawal(classification_function=0, seed=42)
dataset = iter(gen.take(1000))

model = tree.ExtremelyFastDecisionTreeClassifier(
    grace_period=100,
    delta=1e-5,
    nominal_attributes=['elevel', 'car', 'zipcode'],
    min_samples_reevaluate=100
)

metric = metrics.Accuracy()

evaluate.progressive_val_score(dataset, model, metric)
# Accuracy: 87.29%

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/extremely_fast_decision_tree.py

Signature:

class ExtremelyFastDecisionTreeClassifier(HoeffdingTreeClassifier):
    def __init__(
        self,
        grace_period: int = 200,
        max_depth: int | None = None,
        min_samples_reevaluate: int = 20,
        split_criterion: str = "info_gain",
        delta: float = 1e-7,
        tau: float = 0.05,
        leaf_prediction: str = "nba",
        nb_threshold: int = 0,
        nominal_attributes: list | None = None,
        splitter: Splitter | None = None,
        binary_split: bool = False,
        min_branch_fraction: float = 0.01,
        max_share_to_split: float = 0.99,
        max_size: float = 100.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
    )

Import:

from river.tree import ExtremelyFastDecisionTreeClassifier

I/O Contract

Input:

  • x (dict): Feature dictionary with attribute names as keys
  • y (int/str): Class label
  • w (float, optional): Sample weight (default: 1.0)

Output:

  • predict_proba_one(x): Dictionary mapping class labels to probabilities
  • predict_one(x): Predicted class label

Key Parameters

  • grace_period (int): Number of instances between split attempts at leaves
  • min_samples_reevaluate (int): Number of instances before reevaluating splits at internal nodes
  • split_criterion (str): Split criterion ('gini', 'info_gain', 'hellinger')
  • delta (float): Significance level for Hoeffding bound calculation
  • leaf_prediction (str): Prediction mechanism ('mc', 'nb', 'nba')
  • max_depth (int): Maximum tree depth (None for unlimited)
  • binary_split (bool): Whether to restrict to binary splits

Implementation Details

Key Methods:

  • learn_one(x, y, w=1.0): Incrementally train on one sample
  • _sort_to_leaf(x, y, w): Route instance to appropriate leaf
  • _process_nodes(x, y, w, node, parent, branch_index): Process nodes from root to leaf
  • _reevaluate_best_split(node, parent, branch_index): Reevaluate and potentially replace splits
  • _attempt_to_split(node, parent, branch_index): Attempt to split a leaf node
  • _kill_subtree(node): Replace a branch with a leaf

Node Types:

  • EFDTLeafMajorityClass: Leaf using majority class prediction
  • EFDTLeafNaiveBayes: Leaf using Naive Bayes prediction
  • EFDTLeafNaiveBayesAdaptive: Leaf adaptively choosing between MC and NB
  • EFDTNumericBinaryBranch/EFDTNumericMultiwayBranch: Numeric split nodes
  • EFDTNominalBinaryBranch/EFDTNominalMultiwayBranch: Nominal split nodes

Related Pages

References

C. Manapragada, G. Webb, and M. Salehi. "Extremely Fast Decision Tree." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18), 1953-1962. DOI: https://doi.org/10.1145/3219819.3220005

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment