Implementation:Online ml River Tree MondrianTreeNodes

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Decision_Trees, Mondrian_Process
Last Updated	2026-02-08 16:00 GMT

Overview

Node implementations for Mondrian Trees including base nodes, classifier nodes, and regressor nodes. Provides the data structures and operations needed for Mondrian process-based incremental decision trees.

Description

The mondrian_tree_nodes module defines the node hierarchy for Mondrian Trees. Unlike traditional tree nodes that primarily track statistics, Mondrian nodes maintain:

Time information (creation time in the Mondrian process)
Feature range information (min/max for each feature)
Weight information (for aggregation-based predictions)
Parent-child relationships

The module provides both abstract base classes and concrete implementations for classification and regression tasks.

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/mondrian/mondrian_tree_nodes.py

Import:

from river.tree.mondrian.mondrian_tree_nodes import (
    MondrianLeafClassifier,
    MondrianBranchClassifier,
    MondrianLeafRegressor,
    MondrianBranchRegressor,
)

Node Hierarchy

MondrianLeaf (from base.Leaf)
MondrianBranch (from base.Branch)
    └─ MondrianNode (tracking ranges, weights)
        ├─ MondrianNodeClassifier
        │   ├─ MondrianLeafClassifier
        │   └─ MondrianBranchClassifier
        └─ MondrianNodeRegressor
            ├─ MondrianLeafRegressor
            └─ MondrianBranchRegressor

Base Node Classes

MondrianLeaf

Purpose: Base class for all Mondrian leaf nodes.

Attributes:

parent (Branch | None): Parent node
time (float): Creation time in Mondrian process
depth (int): Depth level in tree

Signature:

class MondrianLeaf(Leaf):
    def __init__(self, parent, time, depth):
        super().__init__()
        self.parent = parent
        self.time = time
        self.depth = depth

MondrianBranch

Purpose: Base class for all Mondrian branch nodes.

Attributes:

parent (Branch | None): Parent node
time (float): Creation time
depth (int): Depth level
feature (str): Split feature name
threshold (float): Split threshold
children (tuple): Left and right child nodes

Key Methods:

branch_no(x): Return 0 (left) if x[feature] ≤ threshold, else 1 (right)
next(x): Return child node for routing
most_common_path(): Return (index, child) for most weighted path
repr_split: String representation of split

MondrianNode

Purpose: Adds range tracking and weight management.

Attributes:

memory_range_min (dict): Minimum observed value per feature
memory_range_max (dict): Maximum observed value per feature
weight (float): Node weight (for loss-based updates)
log_weight_tree (float): Combined weight with subtree

Key Methods:

range(feature): Return (min, max) tuple for feature
range_extension(x): Compute range extensions and their sum
update_depth(depth): Recursively update depth for subtree
update_weight_tree(): Compute log_weight_tree from children

Classification Nodes

MondrianNodeClassifier

Additional Attributes:

n_samples (int): Total samples observed
counts (dict): Class distribution {class_label: count}

Key Methods:

replant(leaf, copy_all):

Transfer information from old leaf to new branch
Copies weights, ranges, counts depending on copy_all flag

score(y, dirichlet, n_classes):

Compute P(class=y | node) using Jeffreys prior
Formula: (count(y) + α) / (n_samples + α * n_classes)

predict(dirichlet, classes, n_classes):

Return probability dictionary for all classes
Uses score() for each class

loss(y, dirichlet, n_classes):

Compute -log P(class=y | node)
Used for weight updates

update_weight(y, dirichlet, use_aggregation, step, n_classes):

Update node weight: weight -= step * loss
Only updates if use_aggregation=True

update_count(y):

Increment count for class y

is_dirac(y):

Check if node is pure (all samples have class y)

update_downwards(x, y, dirichlet, use_aggregation, step, do_update_weight, n_classes):

Update ranges with new sample
Increment sample count
Update weight if do_update_weight=True
Update class count

Signature:

class MondrianNodeClassifier(MondrianNode):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.n_samples = 0
        self.counts = collections.defaultdict(int)

MondrianLeafClassifier

Purpose: Leaf node for classification tasks.

Signature:

class MondrianLeafClassifier(MondrianNodeClassifier, MondrianLeaf):
    def __init__(self, parent, time, depth):
        super().__init__(parent, time, depth)

MondrianBranchClassifier

Purpose: Branch node for classification tasks.

Signature:

class MondrianBranchClassifier(MondrianNodeClassifier, MondrianBranch):
    def __init__(self, parent, time, depth, feature, threshold, *children):
        super().__init__(parent, time, depth, feature, threshold, *children)

Regression Nodes

MondrianNodeRegressor

Additional Attributes:

n_samples (int): Total samples observed
mean (stats.Mean): Running mean of target values

Key Methods:

replant(leaf, copy_all):

Transfer weights, mean, and optionally ranges and counts

predict():

Return current mean as prediction

loss(sample_value):

Compute squared error: (predict() - sample_value)² / 2

update_weight(sample_value, use_aggregation, step):

Update weight: weight -= step * loss
Only if use_aggregation=True

update_downwards(x, sample_value, use_aggregation, step, do_update_weight):

Update ranges with new sample
Increment sample count
Update weight if do_update_weight=True
Update running mean

Signature:

class MondrianNodeRegressor(MondrianNode):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.n_samples = 0
        self.mean = stats.Mean()

MondrianLeafRegressor

Purpose: Leaf node for regression tasks.

Signature:

class MondrianLeafRegressor(MondrianNodeRegressor, MondrianLeaf):
    def __init__(self, parent, time, depth):
        super().__init__(parent, time, depth)

MondrianBranchRegressor

Purpose: Branch node for regression tasks.

Signature:

class MondrianBranchRegressor(MondrianNodeRegressor, MondrianBranch):
    def __init__(self, parent, time, depth, feature, threshold, *children):
        super().__init__(parent, time, depth, feature, threshold, *children)

Range Management

Range Tracking:

Each node maintains dictionaries:

memory_range_min: {feature_name: min_value}
memory_range_max: {feature_name: max_value}

Initially empty; updated as samples arrive.

Range Extension Computation:

def range_extension(self, x):
    extensions = {}
    extensions_sum = 0.0
    for feature in x:
        x_f = x[feature]
        feature_min, feature_max = self.range(feature)
        if x_f < feature_min:
            diff = feature_min - x_f
        elif x_f > feature_max:
            diff = x_f - feature_max
        else:
            diff = 0
        extensions[feature] = diff
        extensions_sum += diff
    return extensions_sum, extensions

Used to determine:

Whether to split (if extensions_sum > 0)
Which feature to split on (sample proportional to extensions)
Intensity parameter for exponential sampling

Weight Management

Weight Update:

At each node, weight is updated based on loss: weight -= step * loss(y_true)

Weight Tree Computation:

Recursively computed from leaves to root:

Leaf: log_weight_tree = weight
Branch: log_weight_tree = log_sum_exp(weight, left.log_weight_tree + right.log_weight_tree)

Uses log-sum-exp for numerical stability.

Aggregation Prediction:

When predicting with aggregation, combine predictions: 1. Start at leaf with initial prediction 2. Move up to root 3. At each node:

  * w = exp(weight - log_weight_tree)
  * pred_new = node.predict()
  * prediction = 0.5 * w * pred_new + (1 - 0.5 * w) * prediction

Node Lifecycle

Creation: 1. Tree starts with single leaf at root (time=0, depth=0) 2. When split triggered, leaf promoted to branch or new branch inserted

Splitting:

Two scenarios:

Leaf → Branch:

# Old leaf becomes branch
branch = MondrianBranchClassifier(
    leaf.parent, leaf.time, leaf.depth, feature, threshold
)
left = MondrianLeafClassifier(branch, split_time, depth+1)
right = MondrianLeafClassifier(branch, split_time, depth+1)
branch.children = (left, right)
branch.replant(leaf, copy_all=True)
# Route sample to appropriate child

Branch → Modified Branch:

# Insert new split above existing branch
if is_right_extension:
    left = MondrianBranchClassifier(...)  # Old branch
    right = MondrianLeafClassifier(...)    # New leaf
    left.replant(old_branch)
else:
    left = MondrianLeafClassifier(...)     # New leaf
    right = MondrianBranchClassifier(...)  # Old branch
    right.replant(old_branch)

# Update old branch's children to point to new parent
old_branch.children[0].parent = left  # or right
old_branch.children[1].parent = left  # or right

Example Usage

Creating Nodes:

# Root leaf
root = MondrianLeafClassifier(parent=None, time=0.0, depth=0)

# Update with sample
root.update_downwards(
    x={'age': 25, 'income': 30000},
    y=0,
    dirichlet=0.5,
    use_aggregation=True,
    step=0.1,
    do_update_weight=False,
    n_classes=2
)

# Check range
age_min, age_max = root.range('age')  # (25, 25)

# Compute extensions for new sample
x_new = {'age': 35, 'income': 50000}
ext_sum, extensions = root.range_extension(x_new)
# extensions = {'age': 10, 'income': 20000}
# ext_sum = 20010

Related Pages

Design Considerations

1. Multiple Inheritance: Combines functionality from base classes and Mondrian-specific classes 2. Range Tracking: Uses defaultdict for automatic initialization 3. Weight Management: Log-space computation prevents underflow 4. Flexibility: Same node structure supports classification and regression 5. Parent Tracking: Enables upward tree traversal for aggregation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment