Implementation:Online ml River Tree SGT Losses
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Decision_Trees, Loss_Functions |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Loss functions used by Stochastic Gradient Trees (SGT) to compute gradients and hessians for tree growth and prediction updates. Includes binary cross-entropy for classification and squared error for regression.
Description
SGT loss functions serve as optimization objectives that guide tree construction. Unlike traditional decision trees that use impurity measures (Gini, entropy), SGT directly minimizes differentiable loss functions by computing first and second derivatives.
Each loss function must: 1. Compute gradient and hessian for a single prediction 2. Optionally provide a transfer function to transform raw predictions
The gradient and hessian are used to:
- Update predictions at nodes: delta_pred = -G/(H + lambda)
- Evaluate split merit: expected loss reduction
- Compute statistics for F-tests
Code Reference
Source Location:
/tmp/kapso_repo_178qi9vb/river/tree/losses.py
Signatures:
class Loss(abc.ABC):
@abc.abstractmethod
def compute_derivatives(self, y_true: float, y_pred: float) -> GradHess:
"""Return gradient and hessian for one instance."""
raise NotImplementedError
def transfer(self, y: float) -> float:
"""Transform tree prediction before returning."""
return y
class BinaryCrossEntropyLoss(Loss):
def compute_derivatives(self, y_true, y_pred):
y_trs = self.transfer(y_pred)
return GradHess(y_trs - y_true, y_trs * (1.0 - y_trs))
def transfer(self, y):
return 1.0 / (1.0 + math.exp(-y))
class SquaredErrorLoss(Loss):
def compute_derivatives(self, y_true, y_pred):
return GradHess(y_pred - y_true, 1.0)
Import:
from river.tree.losses import BinaryCrossEntropyLoss, SquaredErrorLoss, Loss
Loss Class (Abstract)
Abstract Method:
- compute_derivatives(y_true, y_pred): Compute gradient and hessian
* Parameters: * y_true (float): True target value * y_pred (float): Predicted value (raw, before transfer) * Returns: GradHess object with gradient and hessian attributes
Concrete Method:
- transfer(y): Transform prediction (default: identity)
* Parameters: * y (float): Raw prediction from tree * Returns: Transformed prediction (e.g., probability)
BinaryCrossEntropyLoss
Purpose: Binary classification with logistic regression style predictions.
Loss Function:
L(y, p) = -[y log(p) + (1-y) log(1-p)]
where p = σ(f) = 1/(1 + exp(-f)) is the sigmoid of the raw prediction f.
Derivatives:
- Gradient: ∂L/∂f = p - y = σ(f) - y
- Hessian: ∂²L/∂f² = p(1-p) = σ(f)(1 - σ(f))
Transfer Function:
Applies sigmoid: p = 1/(1 + exp(-f))
Usage Example:
loss = BinaryCrossEntropyLoss()
# Raw prediction from tree
f = 1.5
# True label (0 or 1)
y_true = 1.0
# Compute derivatives
gh = loss.compute_derivatives(y_true, f)
print(gh.gradient) # σ(1.5) - 1.0 ≈ 0.8176 - 1.0 = -0.1824
print(gh.hessian) # σ(1.5)(1 - σ(1.5)) ≈ 0.1490
# Get probability prediction
prob = loss.transfer(f)
print(prob) # 0.8176
Properties:
- Convex loss function
- Hessian always positive
- Gradient bounded: -1 < gradient < 1
- Used in SGTClassifier
SquaredErrorLoss
Purpose: Regression with mean squared error optimization.
Loss Function:
L(y, f) = (f - y)²/2
Derivatives:
- Gradient: ∂L/∂f = f - y
- Hessian: ∂²L/∂f² = 1
Transfer Function:
Identity: f_out = f_in (no transformation)
Usage Example:
loss = SquaredErrorLoss()
# Raw prediction from tree
f = 3.5
# True target value
y_true = 2.8
# Compute derivatives
gh = loss.compute_derivatives(y_true, f)
print(gh.gradient) # 3.5 - 2.8 = 0.7
print(gh.hessian) # 1.0
# Get prediction (no transformation)
pred = loss.transfer(f)
print(pred) # 3.5
Properties:
- Convex loss function
- Constant hessian simplifies computations
- No transfer transformation needed
- Used in SGTRegressor
GradHess Data Structure
Definition:
class GradHess:
__slots__ = ["gradient", "hessian"]
def __init__(self, gradient: float = 0.0, hessian: float = 0.0):
self.gradient = gradient
self.hessian = hessian
Operations:
- Addition: gh1 + gh2 (element-wise)
- Subtraction: gh1 - gh2 (element-wise)
- In-place: gh1 += gh2, gh1 -= gh2
Used to aggregate gradient/hessian statistics across samples at nodes.
Integration with SGT
Tree Learning Process:
1. Prediction: Node stores raw value f 2. Transfer: Apply loss.transfer(f) to get output 3. Error: Compute loss.compute_derivatives(y_true, f) 4. Update: Aggregate gradients/hessians at node 5. Split: Evaluate candidates using aggregated statistics 6. Refinement: Update node predictions using delta = -G/(H + lambda)
Node Update Formula:
new_pred = old_pred - G/(H + lambda)
where:
- G = sum of gradients
- H = sum of hessians
- lambda = regularization parameter
Extending with Custom Losses
To implement a new loss function:
from river.tree.losses import Loss
from river.tree.utils import GradHess
import math
class HuberLoss(Loss):
"""Huber loss for robust regression."""
def __init__(self, delta=1.0):
self.delta = delta
def compute_derivatives(self, y_true, y_pred):
error = y_pred - y_true
abs_error = abs(error)
if abs_error <= self.delta:
# Quadratic region
gradient = error
hessian = 1.0
else:
# Linear region
gradient = self.delta * (1.0 if error > 0 else -1.0)
hessian = 0.0
return GradHess(gradient, hessian)
# Usage
model = StochasticGradientTree(
loss_func=HuberLoss(delta=1.5),
# ... other parameters
)
Mathematical Background
Newton-Raphson Update:
SGT uses second-order optimization similar to Newton's method:
f_new = f_old - [∂²L/∂f²]⁻¹ [∂L/∂f]
= f_old - (1/H) * G
= f_old - G/H
With regularization: f_new = f_old - G/(H + λ)
Expected Loss Reduction:
For a split with child predictions f_left and f_right:
ΔL = G_left * f_left + 0.5 * H_left * f_left²
+ G_right * f_right + 0.5 * H_right * f_right²
where f_left = -G_left/(H_left + λ) and f_right = -G_right/(H_right + λ)
Related Pages
References
Gouk, H., Pfahringer, B., & Frank, E. (2019). "Stochastic Gradient Trees." Asian Conference on Machine Learning (pp. 1094-1109).