Principle:Online ml River Online Linear Regression
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Linear Models, Stochastic Gradient Descent | 2026-02-08 16:00 GMT |
Overview
Online regression algorithm that incrementally learns linear weights using stochastic gradient descent, serving as the default internal regressor for SNARIMAX forecasting.
Description
Online linear regression maintains a set of weights w and an intercept, updating them incrementally with each new observation via stochastic gradient descent (SGD). For each training example (x, y), the model computes a prediction hat{y} = w^T * x + b, calculates the gradient of the loss with respect to the weights, and takes a gradient step.
In the context of SNARIMAX time series forecasting, LinearRegression serves as the default internal regressor. The SNARIMAX model constructs feature vectors from lagged values (AR terms), past forecast errors (MA terms), and optional exogenous inputs, then delegates the learning and prediction to the linear regression model. The target is the differenced time series value (after applying the SNARIMAX differencing operator).
The default SNARIMAX regressor is actually a pipeline of StandardScaler | LinearRegression, which normalizes features before feeding them to the linear model. This is important because the constructed lag and error features may have very different scales.
Usage
Understand online linear regression when:
- You are using SNARIMAX with the default regressor and want to understand the internal learning mechanism
- You want to tune the learning rate or regularization of the SNARIMAX regressor
- You are considering replacing the default regressor with a custom one
- You need to understand the SGD convergence properties of the forecasting model
Theoretical Basis
Stochastic Gradient Descent for Linear Regression
The model minimizes the squared error loss via SGD:
Loss(y, hat{y}) = (y - hat{y})^2
Gradient with respect to weights:
dL/dw = -2 * (y - hat{y}) * x = -2 * error * x
Weight update:
w_new = w_old - lr * dL/dw = w_old + 2 * lr * error * x
Intercept update (separate learning rate):
b_new = b_old + intercept_lr * error
Regularization
The model supports L1 and L2 regularization:
- L2 regularization (Ridge): Adds
l2 * ||w||^2to the loss, pushing weights toward zero while maintaining smooth weight vectors - L1 regularization (Lasso): Adds
l1 * ||w||_1to the loss, encouraging sparsity in the weight vector
Role in SNARIMAX
When SNARIMAX is configured with default settings, the internal learning step is:
At each time step t:
1. features = {y_{t-1}: val, y_{t-2}: val, ..., e_{t-1}: val, ...}
(constructed from AR lags, seasonal lags, MA errors, seasonal MA errors)
2. target = y'_t (differenced observation)
3. StandardScaler normalizes features
4. LinearRegression:
a. hat{y'}_t = w^T * scaled_features + b
b. w -= lr * gradient(Squared_loss, hat{y'}_t, y'_t)
This means the linear regression weights directly correspond to the autoregressive coefficients of the time series model (after accounting for the standard scaling).
Convergence and Adaptation
- Default optimizer: SGD with learning rate 0.01
- Default loss: Squared loss
- Gradient clipping: Default clip at 1e12 to prevent exploding gradients
- Weight initialization: Zeros by default (via
optim.initializers.Zeros())
The constant learning rate of SGD provides continuous adaptation: the model never fully "converges" but instead tracks changing patterns. This is a feature, not a bug, in the online forecasting context where the data distribution may shift over time.