Implementation:Online ml River Forest OXTRegressor
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Random_Forests, Regression, Concept_Drift |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Online Extra Trees (OXT) is an ensemble regressor that extends randomization beyond feature subsampling to include random split candidates, providing fast adaptation to concept drift.
Description
OXT builds upon Adaptive Random Forests by introducing additional randomization: each tree leaf tests only a single random split point per numerical feature (rather than evaluating multiple candidates), the maximum depth can be randomized across trees, and the feature subspace size can vary per leaf. Trees include drift detection mechanisms that trigger tree replacement when concepts change. Optional resampling (bagging or subbagging) provides diversity. The random splits bring significant speedups but can cause cold-start problems where initial predictions are worse than deterministic methods. Each tree can make predictions using target mean, a linear model, or adaptively choose between them.
Usage
Use OXT for regression on data streams with potential concept drift where prediction speed is critical. It trades some initial accuracy for faster computation and better adaptation. The algorithm works well for large-scale problems and when concept drifts are frequent. Set detection_mode to "all" for drift warning with background trees, "drop" to simply replace drifted trees, or "off" for stationary streams. Increase split_buffer_size for more stable splits at the cost of memory.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/forest/online_extra_trees.py
Signature
class OXTRegressor(
n_models: int = 10,
max_features: bool | str | int = "sqrt",
resampling_strategy: str | None = "subbagging",
resampling_rate: int | float = 0.5,
detection_mode: str = "all",
warning_detector: base.DriftDetector | None = None,
drift_detector: base.DriftDetector | None = None,
max_depth: int | None = None,
randomize_tree_depth: bool = False,
track_metric: metrics.base.RegressionMetric | None = None,
disable_weighted_vote: bool = True,
split_buffer_size: int = 5,
seed: int | None = None,
grace_period: int = 50,
delta: float = 0.01,
tau: float = 0.05,
leaf_prediction: str = "adaptive",
leaf_model: base.Regressor | None = None,
model_selector_decay: float = 0.95,
nominal_attributes: list | None = None,
min_samples_split: int = 5,
binary_split: bool = False,
max_size: int = 500,
memory_estimate_period: int = 2_000_000,
stop_mem_management: bool = False,
remove_poor_attrs: bool = False,
merit_preprune: bool = True,
)
Import
from river import forest
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| x | dict | Feature dictionary with numerical or categorical features |
| y | float | Target value for regression |
| Method | Return Type | Description |
|---|---|---|
| predict_one(x) | float | Predicted regression value (average or weighted) |
| learn_one(x, y) | None | Updates trees with drift detection and resampling |
Usage Examples
from river import datasets
from river import evaluate
from river import metrics
from river import forest
dataset = datasets.synth.Friedman(seed=42).take(5000)
model = forest.OXTRegressor(n_models=3, seed=42)
metric = metrics.RMSE()
result = evaluate.progressive_val_score(dataset, model, metric)
print(result) # RMSE: 3.16212