Principle:Online ml River Adaptive Random Forest

Knowledge Sources	Domains	Last Updated
River River Docs Adaptive Random Forests for Evolving Data Stream Classification	Online Machine Learning, Concept Drift, Ensemble Learning	2026-02-08 16:00 GMT

Overview

The Adaptive Random Forest (ARF) is an online random forest ensemble that handles concept drift through per-tree drift detection, automatic tree replacement, and online bagging with Poisson resampling.

Description

The Adaptive Random Forest extends the classical random forest paradigm to the online learning setting with explicit mechanisms for handling concept drift. It combines three key strategies for building a diverse, adaptive ensemble:

1. Diversity through resampling: Instead of bootstrap aggregating (bagging) on stored datasets, ARF uses online bagging where each training instance is weighted by a value drawn from a Poisson( $λ$ ) distribution. A higher $λ$ (default 6, corresponding to "Leveraging Bagging") increases the diversity among ensemble members.

2. Diversity through random feature subsets: Each base tree considers only a random subset of features at each split node, analogous to the feature randomization in classical random forests. The number of features is controlled by max_features (default "sqrt").

3. Per-tree drift and warning detection: Each tree in the ensemble is monitored by its own drift detector and warning detector (both defaulting to ADWIN). When a warning is detected for a tree, a background tree begins training on the incoming data. If the warning escalates to a confirmed drift, the background tree replaces the affected tree in the ensemble. This provides selective, localized adaptation without resetting the entire ensemble.

The combination of these three mechanisms produces an ensemble that is simultaneously diverse (reducing variance), adaptive (handling concept drift), and stable (localizing resets to individual trees rather than the full ensemble).

Usage

Use the Adaptive Random Forest when:

You need a high-performing ensemble classifier for non-stationary data streams.
You want built-in drift detection and automatic adaptation without manual intervention.
You need robustness against both gradual and abrupt concept drift.
You want the flexibility of configurable drift and warning detectors per tree.
You are willing to accept higher computational cost in exchange for superior accuracy.

Theoretical Basis

Online Bagging with Poisson Resampling:

In traditional bagging, each base model is trained on a bootstrap sample. In the online setting, this is simulated by weighting each instance $(x_{t}, y_{t})$ with a random weight $k \sim Poisson (λ)$ . The base tree processes the instance $k$ times (or equivalently, with weight $k$ ):

$k_{i} \sim Poisson (λ), for each tree i$

Higher $λ$ values increase resampling variability and thus ensemble diversity. The default $λ = 6$ corresponds to the Leveraging Bagging strategy which has been shown to outperform standard online bagging ( $λ = 1$ ).

Random Feature Subsets:

At each internal node of each base tree, only a subset of $⌊ \sqrt{d} ⌋$ features (where $d$ is the total number of features) is considered for splitting, injecting further diversity.

Per-Tree Drift Detection:

Each tree $i$ maintains two ADWIN detectors:

Warning detector (higher $δ$ , e.g., 0.01): Detects early signs of drift.
Drift detector (lower $δ$ , e.g., 0.001): Confirms drift.

The input to the detectors is the binary error indicator: $e_{i, t} = 𝟙 [y_{t} \neq {\hat{y}}_{i, t}]$ .

ARF Learn Step for tree i:
    1. y_pred_i = tree_i.predict_one(x)
    2. Update performance metric for tree i
    3. k = Poisson(lambda)
    4. If k > 0:
       a. tree_i.learn_one(x, y, weight=k)
       b. If background_i exists: background_i.learn_one(x, y, weight=k)
       c. Feed error to warning_detector_i
          If warning detected: background_i = new_tree()
       d. Feed error to drift_detector_i
          If drift detected:
             If background_i exists:
                tree_i = background_i  (replace with background)
             Else:
                tree_i = new_tree()    (reset to fresh tree)
             Reset detectors and metrics for tree i

Weighted Voting:

Predictions are aggregated across trees using weighted soft voting. Each tree's vote is weighted by its performance metric value (default: Accuracy). The final probability for each class is the normalized weighted sum across all trees:

$P (y = c | x) = \frac{\sum_{i = 1}^{M} w_{i} \cdot P_{i} (y = c | x)}{\sum_{i = 1}^{M} w_{i}}$

where $w_{i}$ is the metric value for tree $i$ and $M$ is the number of trees.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment