Implementation:Online ml River Tree ExtremelyFastDecisionTree
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
The Extremely Fast Decision Tree (EFDT) classifier, also known as Hoeffding AnyTime Tree (HATT), is an incremental decision tree that continually revisits and reevaluates split decisions to converge faster to optimal tree structures. Unlike standard Hoeffding Trees, EFDT can replace existing splits when better alternatives are discovered.
Description
EFDT extends the Hoeffding Tree by implementing a split reevaluation mechanism. At each internal node, after observing a configurable number of instances (`min_samples_reevaluate`), the algorithm reassesses whether the current split is still optimal. If a better split is found or if a null split becomes preferable, the subtree is modified accordingly. This makes EFDT adaptive to stationary data streams and provides theoretical guarantees about convergence to batch decision tree structures.
The algorithm processes samples by:
- Sorting instances to leaves and updating statistics
- Processing nodes from root to leaf, reevaluating splits at internal nodes
- Attempting splits at leaves when grace period conditions are met
- Replacing splits or pruning subtrees when reevaluation suggests improvements
Key distinguishing features:
- Split reevaluation at internal nodes
- Subtree pruning and replacement
- Faster convergence to optimal structure
- Support for three leaf prediction strategies: Majority Class, Naive Bayes, and Naive Bayes Adaptive
Usage
from river.datasets import synth
from river import evaluate
from river import metrics
from river import tree
gen = synth.Agrawal(classification_function=0, seed=42)
dataset = iter(gen.take(1000))
model = tree.ExtremelyFastDecisionTreeClassifier(
grace_period=100,
delta=1e-5,
nominal_attributes=['elevel', 'car', 'zipcode'],
min_samples_reevaluate=100
)
metric = metrics.Accuracy()
evaluate.progressive_val_score(dataset, model, metric)
# Accuracy: 87.29%
Code Reference
Source Location:
/tmp/kapso_repo_178qi9vb/river/tree/extremely_fast_decision_tree.py
Signature:
class ExtremelyFastDecisionTreeClassifier(HoeffdingTreeClassifier):
def __init__(
self,
grace_period: int = 200,
max_depth: int | None = None,
min_samples_reevaluate: int = 20,
split_criterion: str = "info_gain",
delta: float = 1e-7,
tau: float = 0.05,
leaf_prediction: str = "nba",
nb_threshold: int = 0,
nominal_attributes: list | None = None,
splitter: Splitter | None = None,
binary_split: bool = False,
min_branch_fraction: float = 0.01,
max_share_to_split: float = 0.99,
max_size: float = 100.0,
memory_estimate_period: int = 1000000,
stop_mem_management: bool = False,
remove_poor_attrs: bool = False,
merit_preprune: bool = True,
)
Import:
from river.tree import ExtremelyFastDecisionTreeClassifier
I/O Contract
Input:
- x (dict): Feature dictionary with attribute names as keys
- y (int/str): Class label
- w (float, optional): Sample weight (default: 1.0)
Output:
- predict_proba_one(x): Dictionary mapping class labels to probabilities
- predict_one(x): Predicted class label
Key Parameters
- grace_period (int): Number of instances between split attempts at leaves
- min_samples_reevaluate (int): Number of instances before reevaluating splits at internal nodes
- split_criterion (str): Split criterion ('gini', 'info_gain', 'hellinger')
- delta (float): Significance level for Hoeffding bound calculation
- leaf_prediction (str): Prediction mechanism ('mc', 'nb', 'nba')
- max_depth (int): Maximum tree depth (None for unlimited)
- binary_split (bool): Whether to restrict to binary splits
Implementation Details
Key Methods:
- learn_one(x, y, w=1.0): Incrementally train on one sample
- _sort_to_leaf(x, y, w): Route instance to appropriate leaf
- _process_nodes(x, y, w, node, parent, branch_index): Process nodes from root to leaf
- _reevaluate_best_split(node, parent, branch_index): Reevaluate and potentially replace splits
- _attempt_to_split(node, parent, branch_index): Attempt to split a leaf node
- _kill_subtree(node): Replace a branch with a leaf
Node Types:
- EFDTLeafMajorityClass: Leaf using majority class prediction
- EFDTLeafNaiveBayes: Leaf using Naive Bayes prediction
- EFDTLeafNaiveBayesAdaptive: Leaf adaptively choosing between MC and NB
- EFDTNumericBinaryBranch/EFDTNumericMultiwayBranch: Numeric split nodes
- EFDTNominalBinaryBranch/EFDTNominalMultiwayBranch: Nominal split nodes
Related Pages
- Online_ml_River_Tree_HoeffdingTreeClassifier
- Online_ml_River_Tree_HoeffdingAdaptiveTreeClassifier
- Online_ml_River_Tree_Base_Nodes
- Online_ml_River_Tree_Utils
References
C. Manapragada, G. Webb, and M. Salehi. "Extremely Fast Decision Tree." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18), 1953-1962. DOI: https://doi.org/10.1145/3219819.3220005