Implementation:Online ml River Tree Splitter Histogram
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Histogram-based attribute observer for classification that discretizes numerical features using adaptive histograms with bounded memory.
Description
HistogramSplitter uses river's Histogram sketch to discretize numerical features for each class. It maintains bounded memory by limiting the maximum number of bins per histogram. The splitter evaluates split candidates by partitioning the feature range and using histogram CDFs to estimate class distributions. This provides a good balance between accuracy and memory efficiency.
Usage
Use HistogramSplitter for classification tasks when memory is limited and approximate split evaluation is acceptable. The histogram approach adapts to the data distribution while maintaining bounded memory.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/tree/splitter/histogram_splitter.py
Signature
class HistogramSplitter(Splitter):
def __init__(self, n_bins: int = 256, n_splits: int = 32):
...
def update(self, att_val, target_val, w):
...
def cond_proba(self, att_val, target_val):
...
def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
...
Import
from river.tree.splitter import HistogramSplitter
I/O Contract
| Input | Type | Description |
|---|---|---|
| att_val | float | Numerical feature value |
| target_val | int/str | Class label |
| w | float | Sample weight |
| n_bins | int | Maximum histogram bins (default 256) |
| n_splits | int | Split candidates to evaluate (default 32) |
| Output | Type | Description |
|---|---|---|
| cond_proba | float | class) from histogram |
| split_suggestion | BranchFactory | Best split using histogram CDF estimates |
Usage Examples
from river.tree.splitter import HistogramSplitter
from river.tree.split_criterion import GiniSplitCriterion
# Create splitter with custom bins and splits
splitter = HistogramSplitter(n_bins=128, n_splits=16)
# Update with observations
for val, label in [(5.5, 'A'), (6.2, 'B'), (5.8, 'A'), (7.1, 'B')]:
splitter.update(val, label, w=1.0)
# Get conditional probability
prob = splitter.cond_proba(att_val=5.5, target_val='A')
# Get best split
criterion = GiniSplitCriterion()
pre_split = {'A': 100, 'B': 80}
suggestion = splitter.best_evaluated_split_suggestion(
criterion=criterion,
pre_split_dist=pre_split,
att_idx='feature1',
binary_only=True
)
print(f"Threshold: {suggestion.split_info}, Merit: {suggestion.merit}")