Implementation:Online ml River Tree Splitter Gaussian
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Gaussian-based attribute observer for classification that approximates class distributions using Gaussian estimators for probability density calculation.
Description
GaussianSplitter approximates the distribution of each class for a numerical feature using Gaussian (normal) distributions. This enables efficient probability density function calculation for Naive Bayes predictions. The splitter tracks minimum and maximum values per class and suggests split candidates by partitioning the feature range into equal-sized bins. Split evaluation uses the CDF of Gaussian distributions to estimate class distributions.
Usage
Use GaussianSplitter for classification tasks when features approximately follow normal distributions within each class and when Naive Bayes leaf models are used.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/tree/splitter/gaussian_splitter.py
Signature
class GaussianSplitter(Splitter):
def __init__(self, n_splits: int = 10):
...
def update(self, att_val, target_val, w):
...
def cond_proba(self, att_val, target_val):
...
def best_evaluated_split_suggestion(self, criterion, pre_split_dist, att_idx, binary_only):
...
Import
from river.tree.splitter import GaussianSplitter
I/O Contract
| Input | Type | Description |
|---|---|---|
| att_val | float | Numerical feature value |
| target_val | int/str | Class label |
| w | float | Sample weight |
| n_splits | int | Number of split candidates to evaluate (default 10) |
| Output | Type | Description |
|---|---|---|
| cond_proba | float | class) from Gaussian PDF |
| split_suggestion | BranchFactory | Best split with estimated post-split distributions |
Usage Examples
from river.tree.splitter import GaussianSplitter
from river.tree.split_criterion import GiniSplitCriterion
# Create splitter with 15 split candidates
splitter = GaussianSplitter(n_splits=15)
# Update with observations
splitter.update(5.5, 'cat', 1.0)
splitter.update(6.2, 'dog', 1.0)
splitter.update(5.8, 'cat', 1.0)
splitter.update(7.1, 'dog', 1.0)
# Get conditional probability for Naive Bayes
prob = splitter.cond_proba(att_val=5.5, target_val='cat')
print(f"P(5.5 | cat) = {prob}")
# Get best split
criterion = GiniSplitCriterion()
pre_split = {'cat': 100, 'dog': 80}
suggestion = splitter.best_evaluated_split_suggestion(
criterion=criterion,
pre_split_dist=pre_split,
att_idx='height',
binary_only=True
)