Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree Splitter Random

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Random_Forests
Last Updated 2026-02-08 16:00 GMT

Overview

Random splitter that selects a single random threshold for split evaluation, used in extremely randomized trees and random forests.

Description

RandomSplitter implements random threshold selection for decision trees. It buffers initial observations to determine the feature range, then randomly selects a threshold uniformly within that range. After the threshold is set, it maintains statistics for left and right branches. The RegRandomSplitter variant is designed for regression tasks and tracks variance statistics. Cloning creates new instances with different random seeds.

Usage

Use RandomSplitter when building extremely randomized trees or random forests where random threshold selection provides diversity. Suitable for ensemble methods that benefit from randomness.

Code Reference

Source Location

Signature

class RandomSplitter(Splitter):
    def __init__(self, seed, buffer_size):
        ...

    def clone(self, new_params: dict | None = None, include_attributes=False):
        ...

    @abc.abstractmethod
    def _update_stats(self, branch, target_val, w):
        pass

    def cond_proba(self, att_val, class_val) -> float:
        raise NotImplementedError

    def update(self, att_val, target_val, w) -> None:
        ...

    def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
        ...


class RegRandomSplitter(RandomSplitter):
    def __init__(self, seed, buffer_size):
        ...

    def _update_stats(self, branch, target_val, w):
        ...

    @property
    def is_target_class(self) -> bool:
        return False

Import

from river.tree.splitter.random_splitter import RegRandomSplitter

I/O Contract

Input Type Description
seed int Random seed for threshold selection
buffer_size int Number of samples to buffer before setting threshold
att_val float Numerical feature value
target_val float Target value (regression)
w float Sample weight
Output Type Description
threshold float Randomly selected split threshold
split_suggestion BranchFactory Split with random threshold and merit

Usage Examples

from river.tree.splitter.random_splitter import RegRandomSplitter
from river.tree.split_criterion import VarianceRatioSplitCriterion
from river.stats import Var

# Create random splitter
splitter = RegRandomSplitter(seed=42, buffer_size=10)

# Update - fills buffer first
for i in range(15):
    splitter.update(float(i), float(i * 2), w=1.0)

# After buffer_size samples, threshold is set
print(f"Threshold: {splitter.threshold}")

# Get split suggestion
criterion = VarianceRatioSplitCriterion()
pre_split = Var()
for i in range(15):
    pre_split.update(float(i * 2), 1.0)

suggestion = splitter.best_evaluated_split_suggestion(
    criterion=criterion,
    pre_split_dist=pre_split,
    att_idx='feature1',
    binary_only=True
)

# Clone creates new instance with different seed
splitter2 = splitter.clone()
assert splitter2.seed != splitter.seed

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment