Implementation:Online ml River Tree ExtremelyFastDecisionTree

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Decision_Trees, Classification
Last Updated	2026-02-08 16:00 GMT

Overview

The Extremely Fast Decision Tree (EFDT) classifier, also known as Hoeffding AnyTime Tree (HATT), is an incremental decision tree that continually revisits and reevaluates split decisions to converge faster to optimal tree structures. Unlike standard Hoeffding Trees, EFDT can replace existing splits when better alternatives are discovered.

Description

EFDT extends the Hoeffding Tree by implementing a split reevaluation mechanism. At each internal node, after observing a configurable number of instances (`min_samples_reevaluate`), the algorithm reassesses whether the current split is still optimal. If a better split is found or if a null split becomes preferable, the subtree is modified accordingly. This makes EFDT adaptive to stationary data streams and provides theoretical guarantees about convergence to batch decision tree structures.

The algorithm processes samples by:

Sorting instances to leaves and updating statistics
Processing nodes from root to leaf, reevaluating splits at internal nodes
Attempting splits at leaves when grace period conditions are met
Replacing splits or pruning subtrees when reevaluation suggests improvements

Key distinguishing features:

Split reevaluation at internal nodes
Subtree pruning and replacement
Faster convergence to optimal structure
Support for three leaf prediction strategies: Majority Class, Naive Bayes, and Naive Bayes Adaptive

Usage

from river.datasets import synth
from river import evaluate
from river import metrics
from river import tree

gen = synth.Agrawal(classification_function=0, seed=42)
dataset = iter(gen.take(1000))

model = tree.ExtremelyFastDecisionTreeClassifier(
    grace_period=100,
    delta=1e-5,
    nominal_attributes=['elevel', 'car', 'zipcode'],
    min_samples_reevaluate=100
)

metric = metrics.Accuracy()

evaluate.progressive_val_score(dataset, model, metric)
# Accuracy: 87.29%

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/extremely_fast_decision_tree.py

Signature:

class ExtremelyFastDecisionTreeClassifier(HoeffdingTreeClassifier):
    def __init__(
        self,
        grace_period: int = 200,
        max_depth: int | None = None,
        min_samples_reevaluate: int = 20,
        split_criterion: str = "info_gain",
        delta: float = 1e-7,
        tau: float = 0.05,
        leaf_prediction: str = "nba",
        nb_threshold: int = 0,
        nominal_attributes: list | None = None,
        splitter: Splitter | None = None,
        binary_split: bool = False,
        min_branch_fraction: float = 0.01,
        max_share_to_split: float = 0.99,
        max_size: float = 100.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
    )

Import:

from river.tree import ExtremelyFastDecisionTreeClassifier

I/O Contract

Input:

x (dict): Feature dictionary with attribute names as keys
y (int/str): Class label
w (float, optional): Sample weight (default: 1.0)

Output:

predict_proba_one(x): Dictionary mapping class labels to probabilities
predict_one(x): Predicted class label

Key Parameters

grace_period (int): Number of instances between split attempts at leaves
min_samples_reevaluate (int): Number of instances before reevaluating splits at internal nodes
split_criterion (str): Split criterion ('gini', 'info_gain', 'hellinger')
delta (float): Significance level for Hoeffding bound calculation
leaf_prediction (str): Prediction mechanism ('mc', 'nb', 'nba')
max_depth (int): Maximum tree depth (None for unlimited)
binary_split (bool): Whether to restrict to binary splits

Implementation Details

Key Methods:

learn_one(x, y, w=1.0): Incrementally train on one sample
_sort_to_leaf(x, y, w): Route instance to appropriate leaf
_process_nodes(x, y, w, node, parent, branch_index): Process nodes from root to leaf
_reevaluate_best_split(node, parent, branch_index): Reevaluate and potentially replace splits
_attempt_to_split(node, parent, branch_index): Attempt to split a leaf node
_kill_subtree(node): Replace a branch with a leaf

Node Types:

EFDTLeafMajorityClass: Leaf using majority class prediction
EFDTLeafNaiveBayes: Leaf using Naive Bayes prediction
EFDTLeafNaiveBayesAdaptive: Leaf adaptively choosing between MC and NB
EFDTNumericBinaryBranch/EFDTNumericMultiwayBranch: Numeric split nodes
EFDTNominalBinaryBranch/EFDTNominalMultiwayBranch: Nominal split nodes

Related Pages

References

C. Manapragada, G. Webb, and M. Salehi. "Extremely Fast Decision Tree." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18), 1953-1962. DOI: https://doi.org/10.1145/3219819.3220005

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment