Principle:Online ml River Online Random Forest Variants

Knowledge Sources	Domains	Last Updated
Machine Learning Mondrian Forests	Online_Learning, Ensemble_Learning, Decision_Trees	2026-02-08 18:00 GMT

Overview

Online random forest variants are ensemble methods that grow collections of decision trees incrementally from streaming data, adapting the classical random forest paradigm to the online learning setting. These variants differ in their tree construction mechanisms, split criteria, and strategies for maintaining diversity and handling concept drift.

Description

The classical Random Forest algorithm (Breiman, 2001) builds an ensemble of decision trees on bootstrap samples of the training data, with random feature subsets considered at each split. This batch construction is impossible in the online setting where data arrives sequentially. Online random forest variants adapt this paradigm through several mechanisms:

Aggregated Mondrian Forest (AMF): Based on the Mondrian process, AMF classifiers grow trees using a stochastic process that partitions the feature space. Unlike Hoeffding trees that wait for statistical confidence before splitting, Mondrian trees can split at any time based on the Mondrian process rate parameter. The Mondrian process ensures that splits are consistent regardless of data arrival order, providing theoretical guarantees about tree structure.

Online Extra Trees (OXT): An online adaptation of the Extremely Randomized Trees (Extra-Trees) algorithm. Rather than searching for optimal splits, OXT selects split points randomly within the range of each candidate feature. This extreme randomization reduces variance and computational cost. In the online setting, the split ranges are updated incrementally as new data is observed.

Both variants maintain the core random forest properties of:

Ensemble diversity through randomization (feature subsampling, random splits, or stochastic processes).
Parallel tree growth where each tree processes each instance independently.
Aggregation via majority voting (classification) or averaging (regression).

Usage

Use online random forest variants when:

You need an ensemble of trees that can learn incrementally from a data stream.
You want strong predictive performance without extensive hyperparameter tuning.
You need models that naturally handle mixed feature types and nonlinear relationships.
You want theoretical guarantees about the consistency of the learned partitions.

Theoretical Basis

Mondrian Process

The Mondrian process defines a distribution over recursive binary partitions of a feature space:

Mondrian Process on region [l_1, u_1] x ... x [l_d, u_d]:
    1. Sample split time E ~ Exponential(sum_j (u_j - l_j))
    2. If E > lifetime budget lambda: stop (create leaf)
    3. Select split dimension j with probability proportional to (u_j - l_j)
    4. Sample split point s ~ Uniform(l_j, u_j)
    5. Recurse on left child [l_j, s] and right child [s, u_j]
       with remaining budget lambda - E

The key property is projectivity: restricting a Mondrian partition to a subset of the data yields the same distribution as building a Mondrian on that subset directly. This makes the Mondrian process naturally suited to online learning.

Online Extra Trees

For each tree in the ensemble:
    For each incoming instance (x, y):
        Route x to the appropriate leaf
        Update leaf statistics
        If split condition met:
            Select k random features
            For each selected feature j:
                Draw random split point s_j ~ Uniform(min_j, max_j)
                Compute score (e.g., Gini reduction) for split at s_j
            Apply the best random split

The extreme randomization in split selection eliminates the need for sorting or histogram construction at each split, reducing per-instance computation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment