Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree VarianceRatioSplitCriterion

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Regression
Last Updated 2026-02-08 16:00 GMT

Overview

Split criterion for regression trees that evaluates splits based on variance reduction ratio.

Description

VarianceRatioSplitCriterion computes the merit of a split as the variance reduction ratio (VR). It calculates the weighted variance reduction after a split compared to the pre-split variance. The merit ranges from 0 to 1, with higher values indicating better splits. The criterion also enforces a minimum sample requirement per post-split branch to ensure statistical reliability. The select_best_branch method identifies which branch minimizes variance.

Usage

Use VarianceRatioSplitCriterion as the splitting criterion when building regression trees. It provides a normalized measure of split quality based on variance reduction.

Code Reference

Source Location

  • Repository: Online_ml_River
  • File: river/tree/split_criterion/variance_ratio_split_criterion.py

Signature

class VarianceRatioSplitCriterion(SplitCriterion):
    def __init__(self, min_samples_split: int = 5):
        ...

    def merit_of_split(self, pre_split_dist, post_split_dist):
        ...

    def current_merit(self, dist):
        ...

    @staticmethod
    def compute_var(dist):
        return dist.get()

    @staticmethod
    def range_of_merit(pre_split_dist):
        return 1.0

    @staticmethod
    def select_best_branch(children_stats):
        ...

Import

from river.tree.split_criterion import VarianceRatioSplitCriterion

I/O Contract

Input Type Description
pre_split_dist Var Pre-split variance statistics
post_split_dist list[Var] Post-split variance statistics per branch
min_samples_split int Minimum samples required per branch (default 5)
Output Type Description
merit float Variance reduction ratio (0 to 1)
best_branch int Branch index with minimum variance

Usage Examples

from river.tree.split_criterion import VarianceRatioSplitCriterion
from river.stats import Var

# Create criterion with minimum sample requirement
criterion = VarianceRatioSplitCriterion(min_samples_split=10)

# Pre-split statistics
pre_split = Var()
for val in [10, 12, 11, 13, 14, 9, 15]:
    pre_split.update(val)

# Post-split statistics (two branches)
left = Var()
for val in [10, 11, 9]:
    left.update(val)

right = Var()
for val in [12, 13, 14, 15]:
    right.update(val)

post_split = [left, right]

# Calculate merit (variance reduction ratio)
merit = criterion.merit_of_split(pre_split, post_split)
print(f"Variance reduction ratio: {merit}")

# Current merit (pre-split variance)
current = criterion.current_merit(pre_split)
print(f"Pre-split variance: {current}")

# Select best branch (minimum variance)
best_branch = criterion.select_best_branch(post_split)
print(f"Best branch: {best_branch}")

# Merit range is always 1.0
print(f"Merit range: {criterion.range_of_merit(pre_split)}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment