Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree Splitter Histogram

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Classification
Last Updated 2026-02-08 16:00 GMT

Overview

Histogram-based attribute observer for classification that discretizes numerical features using adaptive histograms with bounded memory.

Description

HistogramSplitter uses river's Histogram sketch to discretize numerical features for each class. It maintains bounded memory by limiting the maximum number of bins per histogram. The splitter evaluates split candidates by partitioning the feature range and using histogram CDFs to estimate class distributions. This provides a good balance between accuracy and memory efficiency.

Usage

Use HistogramSplitter for classification tasks when memory is limited and approximate split evaluation is acceptable. The histogram approach adapts to the data distribution while maintaining bounded memory.

Code Reference

Source Location

  • Repository: Online_ml_River
  • File: river/tree/splitter/histogram_splitter.py

Signature

class HistogramSplitter(Splitter):
    def __init__(self, n_bins: int = 256, n_splits: int = 32):
        ...

    def update(self, att_val, target_val, w):
        ...

    def cond_proba(self, att_val, target_val):
        ...

    def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
        ...

Import

from river.tree.splitter import HistogramSplitter

I/O Contract

Input Type Description
att_val float Numerical feature value
target_val int/str Class label
w float Sample weight
n_bins int Maximum histogram bins (default 256)
n_splits int Split candidates to evaluate (default 32)
Output Type Description
cond_proba float class) from histogram
split_suggestion BranchFactory Best split using histogram CDF estimates

Usage Examples

from river.tree.splitter import HistogramSplitter
from river.tree.split_criterion import GiniSplitCriterion

# Create splitter with custom bins and splits
splitter = HistogramSplitter(n_bins=128, n_splits=16)

# Update with observations
for val, label in [(5.5, 'A'), (6.2, 'B'), (5.8, 'A'), (7.1, 'B')]:
    splitter.update(val, label, w=1.0)

# Get conditional probability
prob = splitter.cond_proba(att_val=5.5, target_val='A')

# Get best split
criterion = GiniSplitCriterion()
pre_split = {'A': 100, 'B': 80}

suggestion = splitter.best_evaluated_split_suggestion(
    criterion=criterion,
    pre_split_dist=pre_split,
    att_idx='feature1',
    binary_only=True
)

print(f"Threshold: {suggestion.split_info}, Merit: {suggestion.merit}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment