Implementation:Online ml River Tree Splitter Histogram

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Decision_Trees, Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Histogram-based attribute observer for classification that discretizes numerical features using adaptive histograms with bounded memory.

Description

HistogramSplitter uses river's Histogram sketch to discretize numerical features for each class. It maintains bounded memory by limiting the maximum number of bins per histogram. The splitter evaluates split candidates by partitioning the feature range and using histogram CDFs to estimate class distributions. This provides a good balance between accuracy and memory efficiency.

Usage

Use HistogramSplitter for classification tasks when memory is limited and approximate split evaluation is acceptable. The histogram approach adapts to the data distribution while maintaining bounded memory.

Code Reference

Source Location

Repository: Online_ml_River
File: river/tree/splitter/histogram_splitter.py

Signature

class HistogramSplitter(Splitter):
    def __init__(self, n_bins: int = 256, n_splits: int = 32):
        ...

    def update(self, att_val, target_val, w):
        ...

    def cond_proba(self, att_val, target_val):
        ...

    def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
        ...

Import

from river.tree.splitter import HistogramSplitter

I/O Contract

Input	Type	Description
att_val	float	Numerical feature value
target_val	int/str	Class label
w	float	Sample weight
n_bins	int	Maximum histogram bins (default 256)
n_splits	int	Split candidates to evaluate (default 32)

Output	Type	Description
cond_proba	float	class) from histogram
split_suggestion	BranchFactory	Best split using histogram CDF estimates

Usage Examples

from river.tree.splitter import HistogramSplitter
from river.tree.split_criterion import GiniSplitCriterion

# Create splitter with custom bins and splits
splitter = HistogramSplitter(n_bins=128, n_splits=16)

# Update with observations
for val, label in [(5.5, 'A'), (6.2, 'B'), (5.8, 'A'), (7.1, 'B')]:
    splitter.update(val, label, w=1.0)

# Get conditional probability
prob = splitter.cond_proba(att_val=5.5, target_val='A')

# Get best split
criterion = GiniSplitCriterion()
pre_split = {'A': 100, 'B': 80}

suggestion = splitter.best_evaluated_split_suggestion(
    criterion=criterion,
    pre_split_dist=pre_split,
    att_idx='feature1',
    binary_only=True
)

print(f"Threshold: {suggestion.split_info}, Merit: {suggestion.merit}")

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment