Implementation:Online ml River Tree MondrianTreeNodes
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Mondrian_Process |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Node implementations for Mondrian Trees including base nodes, classifier nodes, and regressor nodes. Provides the data structures and operations needed for Mondrian process-based incremental decision trees.
Description
The mondrian_tree_nodes module defines the node hierarchy for Mondrian Trees. Unlike traditional tree nodes that primarily track statistics, Mondrian nodes maintain:
- Time information (creation time in the Mondrian process)
- Feature range information (min/max for each feature)
- Weight information (for aggregation-based predictions)
- Parent-child relationships
The module provides both abstract base classes and concrete implementations for classification and regression tasks.
Code Reference
Source Location:
/tmp/kapso_repo_178qi9vb/river/tree/mondrian/mondrian_tree_nodes.py
Import:
from river.tree.mondrian.mondrian_tree_nodes import (
MondrianLeafClassifier,
MondrianBranchClassifier,
MondrianLeafRegressor,
MondrianBranchRegressor,
)
Node Hierarchy
MondrianLeaf (from base.Leaf)
MondrianBranch (from base.Branch)
└─ MondrianNode (tracking ranges, weights)
├─ MondrianNodeClassifier
│ ├─ MondrianLeafClassifier
│ └─ MondrianBranchClassifier
└─ MondrianNodeRegressor
├─ MondrianLeafRegressor
└─ MondrianBranchRegressor
Base Node Classes
MondrianLeaf
Purpose: Base class for all Mondrian leaf nodes.
Attributes:
- parent (Branch | None): Parent node
- time (float): Creation time in Mondrian process
- depth (int): Depth level in tree
Signature:
class MondrianLeaf(Leaf):
def __init__(self, parent, time, depth):
super().__init__()
self.parent = parent
self.time = time
self.depth = depth
MondrianBranch
Purpose: Base class for all Mondrian branch nodes.
Attributes:
- parent (Branch | None): Parent node
- time (float): Creation time
- depth (int): Depth level
- feature (str): Split feature name
- threshold (float): Split threshold
- children (tuple): Left and right child nodes
Key Methods:
- branch_no(x): Return 0 (left) if x[feature] ≤ threshold, else 1 (right)
- next(x): Return child node for routing
- most_common_path(): Return (index, child) for most weighted path
- repr_split: String representation of split
MondrianNode
Purpose: Adds range tracking and weight management.
Attributes:
- memory_range_min (dict): Minimum observed value per feature
- memory_range_max (dict): Maximum observed value per feature
- weight (float): Node weight (for loss-based updates)
- log_weight_tree (float): Combined weight with subtree
Key Methods:
- range(feature): Return (min, max) tuple for feature
- range_extension(x): Compute range extensions and their sum
- update_depth(depth): Recursively update depth for subtree
- update_weight_tree(): Compute log_weight_tree from children
Classification Nodes
MondrianNodeClassifier
Additional Attributes:
- n_samples (int): Total samples observed
- counts (dict): Class distribution {class_label: count}
Key Methods:
replant(leaf, copy_all):
- Transfer information from old leaf to new branch
- Copies weights, ranges, counts depending on copy_all flag
score(y, dirichlet, n_classes):
- Compute P(class=y | node) using Jeffreys prior
- Formula: (count(y) + α) / (n_samples + α * n_classes)
predict(dirichlet, classes, n_classes):
- Return probability dictionary for all classes
- Uses score() for each class
loss(y, dirichlet, n_classes):
- Compute -log P(class=y | node)
- Used for weight updates
update_weight(y, dirichlet, use_aggregation, step, n_classes):
- Update node weight: weight -= step * loss
- Only updates if use_aggregation=True
update_count(y):
- Increment count for class y
is_dirac(y):
- Check if node is pure (all samples have class y)
update_downwards(x, y, dirichlet, use_aggregation, step, do_update_weight, n_classes):
- Update ranges with new sample
- Increment sample count
- Update weight if do_update_weight=True
- Update class count
Signature:
class MondrianNodeClassifier(MondrianNode):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_samples = 0
self.counts = collections.defaultdict(int)
MondrianLeafClassifier
Purpose: Leaf node for classification tasks.
Signature:
class MondrianLeafClassifier(MondrianNodeClassifier, MondrianLeaf):
def __init__(self, parent, time, depth):
super().__init__(parent, time, depth)
MondrianBranchClassifier
Purpose: Branch node for classification tasks.
Signature:
class MondrianBranchClassifier(MondrianNodeClassifier, MondrianBranch):
def __init__(self, parent, time, depth, feature, threshold, *children):
super().__init__(parent, time, depth, feature, threshold, *children)
Regression Nodes
MondrianNodeRegressor
Additional Attributes:
- n_samples (int): Total samples observed
- mean (stats.Mean): Running mean of target values
Key Methods:
replant(leaf, copy_all):
- Transfer weights, mean, and optionally ranges and counts
predict():
- Return current mean as prediction
loss(sample_value):
- Compute squared error: (predict() - sample_value)² / 2
update_weight(sample_value, use_aggregation, step):
- Update weight: weight -= step * loss
- Only if use_aggregation=True
update_downwards(x, sample_value, use_aggregation, step, do_update_weight):
- Update ranges with new sample
- Increment sample count
- Update weight if do_update_weight=True
- Update running mean
Signature:
class MondrianNodeRegressor(MondrianNode):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_samples = 0
self.mean = stats.Mean()
MondrianLeafRegressor
Purpose: Leaf node for regression tasks.
Signature:
class MondrianLeafRegressor(MondrianNodeRegressor, MondrianLeaf):
def __init__(self, parent, time, depth):
super().__init__(parent, time, depth)
MondrianBranchRegressor
Purpose: Branch node for regression tasks.
Signature:
class MondrianBranchRegressor(MondrianNodeRegressor, MondrianBranch):
def __init__(self, parent, time, depth, feature, threshold, *children):
super().__init__(parent, time, depth, feature, threshold, *children)
Range Management
Range Tracking:
Each node maintains dictionaries:
- memory_range_min: {feature_name: min_value}
- memory_range_max: {feature_name: max_value}
Initially empty; updated as samples arrive.
Range Extension Computation:
def range_extension(self, x):
extensions = {}
extensions_sum = 0.0
for feature in x:
x_f = x[feature]
feature_min, feature_max = self.range(feature)
if x_f < feature_min:
diff = feature_min - x_f
elif x_f > feature_max:
diff = x_f - feature_max
else:
diff = 0
extensions[feature] = diff
extensions_sum += diff
return extensions_sum, extensions
Used to determine:
- Whether to split (if extensions_sum > 0)
- Which feature to split on (sample proportional to extensions)
- Intensity parameter for exponential sampling
Weight Management
Weight Update:
At each node, weight is updated based on loss: weight -= step * loss(y_true)
Weight Tree Computation:
Recursively computed from leaves to root:
- Leaf: log_weight_tree = weight
- Branch: log_weight_tree = log_sum_exp(weight, left.log_weight_tree + right.log_weight_tree)
Uses log-sum-exp for numerical stability.
Aggregation Prediction:
When predicting with aggregation, combine predictions: 1. Start at leaf with initial prediction 2. Move up to root 3. At each node:
* w = exp(weight - log_weight_tree) * pred_new = node.predict() * prediction = 0.5 * w * pred_new + (1 - 0.5 * w) * prediction
Node Lifecycle
Creation: 1. Tree starts with single leaf at root (time=0, depth=0) 2. When split triggered, leaf promoted to branch or new branch inserted
Splitting:
Two scenarios:
Leaf → Branch:
# Old leaf becomes branch
branch = MondrianBranchClassifier(
leaf.parent, leaf.time, leaf.depth, feature, threshold
)
left = MondrianLeafClassifier(branch, split_time, depth+1)
right = MondrianLeafClassifier(branch, split_time, depth+1)
branch.children = (left, right)
branch.replant(leaf, copy_all=True)
# Route sample to appropriate child
Branch → Modified Branch:
# Insert new split above existing branch
if is_right_extension:
left = MondrianBranchClassifier(...) # Old branch
right = MondrianLeafClassifier(...) # New leaf
left.replant(old_branch)
else:
left = MondrianLeafClassifier(...) # New leaf
right = MondrianBranchClassifier(...) # Old branch
right.replant(old_branch)
# Update old branch's children to point to new parent
old_branch.children[0].parent = left # or right
old_branch.children[1].parent = left # or right
Example Usage
Creating Nodes:
# Root leaf
root = MondrianLeafClassifier(parent=None, time=0.0, depth=0)
# Update with sample
root.update_downwards(
x={'age': 25, 'income': 30000},
y=0,
dirichlet=0.5,
use_aggregation=True,
step=0.1,
do_update_weight=False,
n_classes=2
)
# Check range
age_min, age_max = root.range('age') # (25, 25)
# Compute extensions for new sample
x_new = {'age': 35, 'income': 50000}
ext_sum, extensions = root.range_extension(x_new)
# extensions = {'age': 10, 'income': 20000}
# ext_sum = 20010
Related Pages
- Online_ml_River_Tree_MondrianTreeClassifier
- Online_ml_River_Tree_MondrianTreeRegressor
- Online_ml_River_Tree_MondrianTree
- Online_ml_River_Tree_Base_Nodes
- Online_ml_River_Stats_Mean
Design Considerations
1. Multiple Inheritance: Combines functionality from base classes and Mondrian-specific classes 2. Range Tracking: Uses defaultdict for automatic initialization 3. Weight Management: Log-space computation prevents underflow 4. Flexibility: Same node structure supports classification and regression 5. Parent Tracking: Enables upward tree traversal for aggregation