Implementation:Online ml River Tree VarianceRatioSplitCriterion
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Regression |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Split criterion for regression trees that evaluates splits based on variance reduction ratio.
Description
VarianceRatioSplitCriterion computes the merit of a split as the variance reduction ratio (VR). It calculates the weighted variance reduction after a split compared to the pre-split variance. The merit ranges from 0 to 1, with higher values indicating better splits. The criterion also enforces a minimum sample requirement per post-split branch to ensure statistical reliability. The select_best_branch method identifies which branch minimizes variance.
Usage
Use VarianceRatioSplitCriterion as the splitting criterion when building regression trees. It provides a normalized measure of split quality based on variance reduction.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/tree/split_criterion/variance_ratio_split_criterion.py
Signature
class VarianceRatioSplitCriterion(SplitCriterion):
def __init__(self, min_samples_split: int = 5):
...
def merit_of_split(self, pre_split_dist, post_split_dist):
...
def current_merit(self, dist):
...
@staticmethod
def compute_var(dist):
return dist.get()
@staticmethod
def range_of_merit(pre_split_dist):
return 1.0
@staticmethod
def select_best_branch(children_stats):
...
Import
from river.tree.split_criterion import VarianceRatioSplitCriterion
I/O Contract
| Input | Type | Description |
|---|---|---|
| pre_split_dist | Var | Pre-split variance statistics |
| post_split_dist | list[Var] | Post-split variance statistics per branch |
| min_samples_split | int | Minimum samples required per branch (default 5) |
| Output | Type | Description |
|---|---|---|
| merit | float | Variance reduction ratio (0 to 1) |
| best_branch | int | Branch index with minimum variance |
Usage Examples
from river.tree.split_criterion import VarianceRatioSplitCriterion
from river.stats import Var
# Create criterion with minimum sample requirement
criterion = VarianceRatioSplitCriterion(min_samples_split=10)
# Pre-split statistics
pre_split = Var()
for val in [10, 12, 11, 13, 14, 9, 15]:
pre_split.update(val)
# Post-split statistics (two branches)
left = Var()
for val in [10, 11, 9]:
left.update(val)
right = Var()
for val in [12, 13, 14, 15]:
right.update(val)
post_split = [left, right]
# Calculate merit (variance reduction ratio)
merit = criterion.merit_of_split(pre_split, post_split)
print(f"Variance reduction ratio: {merit}")
# Current merit (pre-split variance)
current = criterion.current_merit(pre_split)
print(f"Pre-split variance: {current}")
# Select best branch (minimum variance)
best_branch = criterion.select_best_branch(post_split)
print(f"Best branch: {best_branch}")
# Merit range is always 1.0
print(f"Merit range: {criterion.range_of_merit(pre_split)}")