Principle:Online ml River Online Linear Regression

Knowledge Sources	Domains	Last Updated
River River Docs	Online Machine Learning, Linear Models, Stochastic Gradient Descent	2026-02-08 16:00 GMT

Overview

Online regression algorithm that incrementally learns linear weights using stochastic gradient descent, serving as the default internal regressor for SNARIMAX forecasting.

Description

Online linear regression maintains a set of weights w and an intercept, updating them incrementally with each new observation via stochastic gradient descent (SGD). For each training example (x, y), the model computes a prediction hat{y} = w^T * x + b, calculates the gradient of the loss with respect to the weights, and takes a gradient step.

In the context of SNARIMAX time series forecasting, LinearRegression serves as the default internal regressor. The SNARIMAX model constructs feature vectors from lagged values (AR terms), past forecast errors (MA terms), and optional exogenous inputs, then delegates the learning and prediction to the linear regression model. The target is the differenced time series value (after applying the SNARIMAX differencing operator).

The default SNARIMAX regressor is actually a pipeline of StandardScaler | LinearRegression, which normalizes features before feeding them to the linear model. This is important because the constructed lag and error features may have very different scales.

Usage

Understand online linear regression when:

You are using SNARIMAX with the default regressor and want to understand the internal learning mechanism
You want to tune the learning rate or regularization of the SNARIMAX regressor
You are considering replacing the default regressor with a custom one
You need to understand the SGD convergence properties of the forecasting model

Theoretical Basis

Stochastic Gradient Descent for Linear Regression

The model minimizes the squared error loss via SGD:

Loss(y, hat{y}) = (y - hat{y})^2

Gradient with respect to weights:
dL/dw = -2 * (y - hat{y}) * x = -2 * error * x

Weight update:
w_new = w_old - lr * dL/dw = w_old + 2 * lr * error * x

Intercept update (separate learning rate):
b_new = b_old + intercept_lr * error

Regularization

The model supports L1 and L2 regularization:

L2 regularization (Ridge): Adds l2 * ||w||^2 to the loss, pushing weights toward zero while maintaining smooth weight vectors
L1 regularization (Lasso): Adds l1 * ||w||_1 to the loss, encouraging sparsity in the weight vector

Role in SNARIMAX

When SNARIMAX is configured with default settings, the internal learning step is:

At each time step t:
  1. features = {y_{t-1}: val, y_{t-2}: val, ..., e_{t-1}: val, ...}
     (constructed from AR lags, seasonal lags, MA errors, seasonal MA errors)
  2. target = y'_t  (differenced observation)
  3. StandardScaler normalizes features
  4. LinearRegression:
     a. hat{y'}_t = w^T * scaled_features + b
     b. w -= lr * gradient(Squared_loss, hat{y'}_t, y'_t)

This means the linear regression weights directly correspond to the autoregressive coefficients of the time series model (after accounting for the standard scaling).

Convergence and Adaptation

Default optimizer: SGD with learning rate 0.01
Default loss: Squared loss
Gradient clipping: Default clip at 1e12 to prevent exploding gradients
Weight initialization: Zeros by default (via optim.initializers.Zeros())

The constant learning rate of SGD provides continuous adaptation: the model never fully "converges" but instead tracks changing patterns. This is a feature, not a bug, in the online forecasting context where the data distribution may shift over time.

Related Pages

Implementation:Online_ml_River_Linear_Model_LinearRegression

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment