Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree iSOUPTreeRegressor

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Multi_Target_Regression
Last Updated 2026-02-08 16:00 GMT

Overview

Incremental Structured Output Prediction Tree (iSOUP-Tree) is a multi-target regression tree that simultaneously predicts multiple continuous outputs. It extends Hoeffding Tree Regressor to handle multiple correlated target variables using intra-cluster variance reduction.

Description

iSOUP-Tree addresses multi-target regression by treating the target space as a cluster and minimizing the intra-cluster variance when selecting splits. Instead of treating each target independently, it considers the joint variance across all targets, allowing the tree to capture correlations between outputs.

Key features:

  • Simultaneous prediction of multiple continuous targets
  • Intra-cluster variance reduction split criterion
  • Three leaf prediction strategies adapted for multi-target scenarios
  • Can use different regression models for each target
  • Efficient incremental learning for structured outputs

The split criterion minimizes the sum of variances across all targets after splitting, weighted by the number of samples in each branch. This encourages splits that create homogeneous regions in the multi-dimensional target space.

Usage

import numbers
from river import compose
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import preprocessing
from river import tree

dataset = datasets.SolarFlare()

num = compose.SelectType(numbers.Number) | preprocessing.MinMaxScaler()
cat = compose.SelectType(str) | preprocessing.OneHotEncoder()

model = tree.iSOUPTreeRegressor(
    grace_period=100,
    leaf_prediction='model',
    leaf_model={
        'c-class-flares': linear_model.LinearRegression(l2=0.02),
        'm-class-flares': linear_model.PARegressor(),
        'x-class-flares': linear_model.LinearRegression(l2=0.1)
    }
)

pipeline = (num + cat) | model
metric = metrics.multioutput.MicroAverage(metrics.MAE())

evaluate.progressive_val_score(dataset, pipeline, metric)
# MicroAverage(MAE): 0.426177

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/isoup_tree_regressor.py

Signature:

class iSOUPTreeRegressor(tree.HoeffdingTreeRegressor, base.MultiTargetRegressor):
    def __init__(
        self,
        grace_period: int = 200,
        max_depth: int | None = None,
        delta: float = 1e-7,
        tau: float = 0.05,
        leaf_prediction: str = "adaptive",
        leaf_model: base.Regressor | dict | None = None,
        model_selector_decay: float = 0.95,
        nominal_attributes: list | None = None,
        splitter: Splitter | None = None,
        min_samples_split: int = 5,
        binary_split: bool = False,
        max_size: float = 500.0,
        memory_estimate_period: int = 1000000,
        stop_mem_management: bool = False,
        remove_poor_attrs: bool = False,
        merit_preprune: bool = True,
    )

Import:

from river.tree import iSOUPTreeRegressor

I/O Contract

Input:

  • x (dict): Feature dictionary with attribute names as keys
  • y (dict): Dictionary mapping target names to values (e.g., {'target1': 2.5, 'target2': 1.3})
  • w (float, optional): Sample weight (default: 1.0)

Output:

  • predict_one(x): Dictionary mapping target names to predicted values

Key Parameters

  • grace_period (int): Number of instances between split attempts
  • leaf_prediction (str): Prediction strategy ('mean', 'model', 'adaptive')
  • leaf_model (Regressor | dict): Models for targets. Can be:
 * Single regressor (replicated to all targets)
 * Dictionary mapping target names to regressors
 * None (uses LinearRegression for all targets)
  • model_selector_decay (float): Exponential decay for adaptive strategy
  • delta (float): Significance level for Hoeffding bound
  • tau (float): Tie-breaking threshold
  • splitter (Splitter): Attribute observer (default: TEBSTSplitter)
  • min_samples_split (int): Minimum samples per branch

Implementation Details

Key Methods:

  • learn_one(x, y, w=1.0): Train on one multi-target instance
  • predict_one(x): Predict all targets
  • _new_leaf(initial_stats, parent): Create multi-target leaf
  • _new_split_criterion(): Create IntraClusterVarianceReductionSplitCriterion

Node Types:

  • LeafMeanMultiTarget: Predicts mean for each target
  • LeafModelMultiTarget: Uses separate model for each target
  • LeafAdaptiveMultiTarget: Adaptively chooses between mean and model per target

Split Criterion:

IntraClusterVarianceReductionSplitCriterion computes:

  • Pre-split: Sum of variances across all targets
  • Post-split: Weighted sum of variances in each branch
  • Merit: (pre-split variance) - (post-split variance)

For k targets: merit = Σᵢ₌₁ᵏ var(yᵢ) - Σⱼ [nⱼ/n * Σᵢ₌₁ᵏ var(yᵢ|branch j)]

Multi-Target Leaf Models

Dictionary Specification:

When leaf_model is a dictionary:

  • Keys: target variable names
  • Values: Regressor instances for each target
  • If a target is missing, a copy of the first model is used

Example:

leaf_model = {
    'temperature': linear_model.LinearRegression(l2=0.1),
    'humidity': linear_model.PARegressor(),
    'pressure': linear_model.LinearRegression(l2=0.05)
}

Model Inheritance:

When a node splits, child leaves inherit:

  • Deep copies of parent's models
  • Adaptive leaf statistics (fmse_mean, fmse_model)
  • This ensures continuity in model adaptation

Adaptive Strategy

For each target independently: 1. Maintain exponentially smoothed squared errors for mean and model predictions 2. Before prediction, compare smoothed errors 3. Use the predictor with lower error for that target 4. Different targets may use different strategies at the same leaf

Target Discovery

The tree dynamically discovers targets:

  • Maintains a set of observed target names
  • Updates the set each time a new target appears
  • Handles scenarios where:
 * Not all samples contain all targets
 * New targets emerge over time
 * Target names are strings or other hashable types

Comparison with Standard HTR

Feature iSOUP-Tree Hoeffding Tree Regressor
Output type Multiple targets Single target
Split criterion Intra-cluster variance reduction Variance reduction
Leaf models One per target Single model
Target correlation Captured Ignored
Prediction Dict of values Single value

Related Pages

References

Aljaž Osojnik, Panče Panov, and Sašo Džeroski. "Tree-based methods for online multi-target regression." Journal of Intelligent Information Systems 50.2 (2018): 315-339.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment