Implementation:Online ml River Tree Splitter Random
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Random_Forests |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Random splitter that selects a single random threshold for split evaluation, used in extremely randomized trees and random forests.
Description
RandomSplitter implements random threshold selection for decision trees. It buffers initial observations to determine the feature range, then randomly selects a threshold uniformly within that range. After the threshold is set, it maintains statistics for left and right branches. The RegRandomSplitter variant is designed for regression tasks and tracks variance statistics. Cloning creates new instances with different random seeds.
Usage
Use RandomSplitter when building extremely randomized trees or random forests where random threshold selection provides diversity. Suitable for ensemble methods that benefit from randomness.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/tree/splitter/random_splitter.py
Signature
class RandomSplitter(Splitter):
def __init__(self, seed, buffer_size):
...
def clone(self, new_params: dict | None = None, include_attributes=False):
...
@abc.abstractmethod
def _update_stats(self, branch, target_val, w):
pass
def cond_proba(self, att_val, class_val) -> float:
raise NotImplementedError
def update(self, att_val, target_val, w) -> None:
...
def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
...
class RegRandomSplitter(RandomSplitter):
def __init__(self, seed, buffer_size):
...
def _update_stats(self, branch, target_val, w):
...
@property
def is_target_class(self) -> bool:
return False
Import
from river.tree.splitter.random_splitter import RegRandomSplitter
I/O Contract
| Input | Type | Description |
|---|---|---|
| seed | int | Random seed for threshold selection |
| buffer_size | int | Number of samples to buffer before setting threshold |
| att_val | float | Numerical feature value |
| target_val | float | Target value (regression) |
| w | float | Sample weight |
| Output | Type | Description |
|---|---|---|
| threshold | float | Randomly selected split threshold |
| split_suggestion | BranchFactory | Split with random threshold and merit |
Usage Examples
from river.tree.splitter.random_splitter import RegRandomSplitter
from river.tree.split_criterion import VarianceRatioSplitCriterion
from river.stats import Var
# Create random splitter
splitter = RegRandomSplitter(seed=42, buffer_size=10)
# Update - fills buffer first
for i in range(15):
splitter.update(float(i), float(i * 2), w=1.0)
# After buffer_size samples, threshold is set
print(f"Threshold: {splitter.threshold}")
# Get split suggestion
criterion = VarianceRatioSplitCriterion()
pre_split = Var()
for i in range(15):
pre_split.update(float(i * 2), 1.0)
suggestion = splitter.best_evaluated_split_suggestion(
criterion=criterion,
pre_split_dist=pre_split,
att_idx='feature1',
binary_only=True
)
# Clone creates new instance with different seed
splitter2 = splitter.clone()
assert splitter2.seed != splitter.seed