Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Online ml River Online Logistic Regression

From Leeroopedia


Knowledge Sources River River Docs
Domains Online_Learning Classification Optimization
Last Updated 2026-02-08 16:00 GMT

Overview

Online logistic regression is a binary classification algorithm that learns a linear decision boundary by performing stochastic gradient descent on the log-loss function, updating weights one observation at a time.

Description

Logistic regression is one of the most widely used algorithms for binary classification. It models the probability of the positive class as the sigmoid (logistic) function of a linear combination of features. In the online setting, the model processes one observation at a time, computing the gradient of the loss function for that single sample and updating the weight vector accordingly. This makes it a stochastic gradient descent (SGD) approach to logistic regression.

River's implementation builds on a Generalized Linear Model (GLM) base class that handles the core SGD mechanics: computing the raw dot product, evaluating loss gradients, clipping gradients, and applying weight updates via a pluggable optimizer. The LogisticRegression class specializes this by using the log-loss (binary cross-entropy) as the loss function and the sigmoid function as the mean function to map raw scores to probabilities.

The algorithm supports:

  • L1 regularization: Encourages sparse weight vectors by penalizing the absolute value of weights (uses a cumulative penalty approach for online L1).
  • L2 regularization: Encourages small weight vectors by penalizing the squared magnitude of weights.
  • Pluggable optimizers: Any optimizer from River's optim module (SGD, Adam, AdaGrad, etc.) can be used.
  • Gradient clipping: Prevents exploding gradients by clamping gradient values to a maximum absolute value.

Usage

Use online logistic regression when:

  • You need a binary classifier that can learn incrementally from streaming data.
  • You want an interpretable model with a linear decision boundary.
  • You need probabilistic predictions (class probabilities, not just labels).
  • You want to combine it with feature scaling in a pipeline for best results.

Theoretical Basis

Model: The probability of the positive class given features x is:

p(y=1 | x) = sigmoid(w . x + b)

where sigmoid(z)=1/(1+exp(z)), w is the weight vector, and b is the intercept (bias).

Loss function (log-loss / binary cross-entropy):

L(y, p) = -y * log(p) - (1 - y) * log(1 - p)

Gradient computation: For a single observation (x,y):

gradient_w = (sigmoid(w . x + b) - y) * x
gradient_b = sigmoid(w . x + b) - y

Weight update (SGD):

w_new = w_old - learning_rate * gradient_w
b_new = b_old - intercept_lr * gradient_b

With L2 regularization:

gradient_w = (sigmoid(w . x + b) - y) * x + l2 * w

With L1 regularization (cumulative penalty):

The online L1 penalty uses a cumulative approach where a running maximum cumulative L1 penalty is maintained. After each weight update, the penalty is applied:

if w_j > 0:
    w_j = max(0, w_j - (max_cum_l1 + cum_l1_j))
elif w_j < 0:
    w_j = min(0, w_j + (max_cum_l1 - cum_l1_j))

Prediction: The predict_proba_one method returns a dictionary mapping each class label to its predicted probability: {False: 1-p, True: p}. The predict_one method returns the class with the highest probability.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment