Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Tree SGT Losses

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Decision_Trees, Loss_Functions
Last Updated 2026-02-08 16:00 GMT

Overview

Loss functions used by Stochastic Gradient Trees (SGT) to compute gradients and hessians for tree growth and prediction updates. Includes binary cross-entropy for classification and squared error for regression.

Description

SGT loss functions serve as optimization objectives that guide tree construction. Unlike traditional decision trees that use impurity measures (Gini, entropy), SGT directly minimizes differentiable loss functions by computing first and second derivatives.

Each loss function must: 1. Compute gradient and hessian for a single prediction 2. Optionally provide a transfer function to transform raw predictions

The gradient and hessian are used to:

  • Update predictions at nodes: delta_pred = -G/(H + lambda)
  • Evaluate split merit: expected loss reduction
  • Compute statistics for F-tests

Code Reference

Source Location: /tmp/kapso_repo_178qi9vb/river/tree/losses.py

Signatures:

class Loss(abc.ABC):
    @abc.abstractmethod
    def compute_derivatives(self, y_true: float, y_pred: float) -> GradHess:
        """Return gradient and hessian for one instance."""
        raise NotImplementedError

    def transfer(self, y: float) -> float:
        """Transform tree prediction before returning."""
        return y

class BinaryCrossEntropyLoss(Loss):
    def compute_derivatives(self, y_true, y_pred):
        y_trs = self.transfer(y_pred)
        return GradHess(y_trs - y_true, y_trs * (1.0 - y_trs))

    def transfer(self, y):
        return 1.0 / (1.0 + math.exp(-y))

class SquaredErrorLoss(Loss):
    def compute_derivatives(self, y_true, y_pred):
        return GradHess(y_pred - y_true, 1.0)

Import:

from river.tree.losses import BinaryCrossEntropyLoss, SquaredErrorLoss, Loss

Loss Class (Abstract)

Abstract Method:

  • compute_derivatives(y_true, y_pred): Compute gradient and hessian
 * Parameters:
   * y_true (float): True target value
   * y_pred (float): Predicted value (raw, before transfer)
 * Returns: GradHess object with gradient and hessian attributes

Concrete Method:

  • transfer(y): Transform prediction (default: identity)
 * Parameters:
   * y (float): Raw prediction from tree
 * Returns: Transformed prediction (e.g., probability)

BinaryCrossEntropyLoss

Purpose: Binary classification with logistic regression style predictions.

Loss Function:

L(y, p) = -[y log(p) + (1-y) log(1-p)]

where p = σ(f) = 1/(1 + exp(-f)) is the sigmoid of the raw prediction f.

Derivatives:

  • Gradient: ∂L/∂f = p - y = σ(f) - y
  • Hessian: ∂²L/∂f² = p(1-p) = σ(f)(1 - σ(f))

Transfer Function:

Applies sigmoid: p = 1/(1 + exp(-f))

Usage Example:

loss = BinaryCrossEntropyLoss()

# Raw prediction from tree
f = 1.5

# True label (0 or 1)
y_true = 1.0

# Compute derivatives
gh = loss.compute_derivatives(y_true, f)
print(gh.gradient)  # σ(1.5) - 1.0 ≈ 0.8176 - 1.0 = -0.1824
print(gh.hessian)   # σ(1.5)(1 - σ(1.5)) ≈ 0.1490

# Get probability prediction
prob = loss.transfer(f)
print(prob)  # 0.8176

Properties:

  • Convex loss function
  • Hessian always positive
  • Gradient bounded: -1 < gradient < 1
  • Used in SGTClassifier

SquaredErrorLoss

Purpose: Regression with mean squared error optimization.

Loss Function:

L(y, f) = (f - y)²/2

Derivatives:

  • Gradient: ∂L/∂f = f - y
  • Hessian: ∂²L/∂f² = 1

Transfer Function:

Identity: f_out = f_in (no transformation)

Usage Example:

loss = SquaredErrorLoss()

# Raw prediction from tree
f = 3.5

# True target value
y_true = 2.8

# Compute derivatives
gh = loss.compute_derivatives(y_true, f)
print(gh.gradient)  # 3.5 - 2.8 = 0.7
print(gh.hessian)   # 1.0

# Get prediction (no transformation)
pred = loss.transfer(f)
print(pred)  # 3.5

Properties:

  • Convex loss function
  • Constant hessian simplifies computations
  • No transfer transformation needed
  • Used in SGTRegressor

GradHess Data Structure

Definition:

class GradHess:
    __slots__ = ["gradient", "hessian"]

    def __init__(self, gradient: float = 0.0, hessian: float = 0.0):
        self.gradient = gradient
        self.hessian = hessian

Operations:

  • Addition: gh1 + gh2 (element-wise)
  • Subtraction: gh1 - gh2 (element-wise)
  • In-place: gh1 += gh2, gh1 -= gh2

Used to aggregate gradient/hessian statistics across samples at nodes.

Integration with SGT

Tree Learning Process:

1. Prediction: Node stores raw value f 2. Transfer: Apply loss.transfer(f) to get output 3. Error: Compute loss.compute_derivatives(y_true, f) 4. Update: Aggregate gradients/hessians at node 5. Split: Evaluate candidates using aggregated statistics 6. Refinement: Update node predictions using delta = -G/(H + lambda)

Node Update Formula:

new_pred = old_pred - G/(H + lambda)

where:

  • G = sum of gradients
  • H = sum of hessians
  • lambda = regularization parameter

Extending with Custom Losses

To implement a new loss function:

from river.tree.losses import Loss
from river.tree.utils import GradHess
import math

class HuberLoss(Loss):
    """Huber loss for robust regression."""

    def __init__(self, delta=1.0):
        self.delta = delta

    def compute_derivatives(self, y_true, y_pred):
        error = y_pred - y_true
        abs_error = abs(error)

        if abs_error <= self.delta:
            # Quadratic region
            gradient = error
            hessian = 1.0
        else:
            # Linear region
            gradient = self.delta * (1.0 if error > 0 else -1.0)
            hessian = 0.0

        return GradHess(gradient, hessian)

# Usage
model = StochasticGradientTree(
    loss_func=HuberLoss(delta=1.5),
    # ... other parameters
)

Mathematical Background

Newton-Raphson Update:

SGT uses second-order optimization similar to Newton's method:

f_new = f_old - [∂²L/∂f²]⁻¹ [∂L/∂f]

     = f_old - (1/H) * G
     = f_old - G/H

With regularization: f_new = f_old - G/(H + λ)

Expected Loss Reduction:

For a split with child predictions f_left and f_right:

ΔL = G_left * f_left + 0.5 * H_left * f_left²

  + G_right * f_right + 0.5 * H_right * f_right²

where f_left = -G_left/(H_left + λ) and f_right = -G_right/(H_right + λ)

Related Pages

References

Gouk, H., Pfahringer, B., & Frank, E. (2019). "Stochastic Gradient Trees." Asian Conference on Machine Learning (pp. 1094-1109).

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment