Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Online ml River Online Linear Regression

From Leeroopedia


Knowledge Sources Domains Last Updated
River River Docs Online Machine Learning, Linear Models, Stochastic Gradient Descent 2026-02-08 16:00 GMT

Overview

Online regression algorithm that incrementally learns linear weights using stochastic gradient descent, serving as the default internal regressor for SNARIMAX forecasting.

Description

Online linear regression maintains a set of weights w and an intercept, updating them incrementally with each new observation via stochastic gradient descent (SGD). For each training example (x, y), the model computes a prediction hat{y} = w^T * x + b, calculates the gradient of the loss with respect to the weights, and takes a gradient step.

In the context of SNARIMAX time series forecasting, LinearRegression serves as the default internal regressor. The SNARIMAX model constructs feature vectors from lagged values (AR terms), past forecast errors (MA terms), and optional exogenous inputs, then delegates the learning and prediction to the linear regression model. The target is the differenced time series value (after applying the SNARIMAX differencing operator).

The default SNARIMAX regressor is actually a pipeline of StandardScaler | LinearRegression, which normalizes features before feeding them to the linear model. This is important because the constructed lag and error features may have very different scales.

Usage

Understand online linear regression when:

  • You are using SNARIMAX with the default regressor and want to understand the internal learning mechanism
  • You want to tune the learning rate or regularization of the SNARIMAX regressor
  • You are considering replacing the default regressor with a custom one
  • You need to understand the SGD convergence properties of the forecasting model

Theoretical Basis

Stochastic Gradient Descent for Linear Regression

The model minimizes the squared error loss via SGD:

Loss(y, hat{y}) = (y - hat{y})^2

Gradient with respect to weights:
dL/dw = -2 * (y - hat{y}) * x = -2 * error * x

Weight update:
w_new = w_old - lr * dL/dw = w_old + 2 * lr * error * x

Intercept update (separate learning rate):
b_new = b_old + intercept_lr * error

Regularization

The model supports L1 and L2 regularization:

  • L2 regularization (Ridge): Adds l2 * ||w||^2 to the loss, pushing weights toward zero while maintaining smooth weight vectors
  • L1 regularization (Lasso): Adds l1 * ||w||_1 to the loss, encouraging sparsity in the weight vector

Role in SNARIMAX

When SNARIMAX is configured with default settings, the internal learning step is:

At each time step t:
  1. features = {y_{t-1}: val, y_{t-2}: val, ..., e_{t-1}: val, ...}
     (constructed from AR lags, seasonal lags, MA errors, seasonal MA errors)
  2. target = y'_t  (differenced observation)
  3. StandardScaler normalizes features
  4. LinearRegression:
     a. hat{y'}_t = w^T * scaled_features + b
     b. w -= lr * gradient(Squared_loss, hat{y'}_t, y'_t)

This means the linear regression weights directly correspond to the autoregressive coefficients of the time series model (after accounting for the standard scaling).

Convergence and Adaptation

  • Default optimizer: SGD with learning rate 0.01
  • Default loss: Squared loss
  • Gradient clipping: Default clip at 1e12 to prevent exploding gradients
  • Weight initialization: Zeros by default (via optim.initializers.Zeros())

The constant learning rate of SGD provides continuous adaptation: the model never fully "converges" but instead tracks changing patterns. This is a feature, not a bug, in the online forecasting context where the data distribution may shift over time.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment