Principle:Online ml River Online Linear Models

Knowledge Sources	Domains	Last Updated
Machine Learning Optimization	Online_Learning, Linear_Models, Classification	2026-02-08 18:00 GMT

Overview

Online linear models are a family of linear classifiers and regressors that update their weight vectors incrementally with each new observation. This family includes classical algorithms such as the Perceptron, Passive-Aggressive methods, ALMA, Bayesian linear regression, and softmax regression, each with distinct update rules and theoretical motivations.

Description

Linear models predict by computing a weighted sum of input features: $\hat{y} = w^{T} x + b$ . In the online setting, the weight vector $w$ is updated after each observation, making these models naturally suited to streaming data. Different update rules yield different algorithms with distinct properties:

Perceptron: The simplest online linear classifier. It updates weights only when a misclassification occurs, adding or subtracting the input vector scaled by the learning rate. Despite its simplicity, the Perceptron convergence theorem guarantees that it will find a separating hyperplane for linearly separable data in finite steps.

Passive-Aggressive (PA): A margin-based online learning algorithm that makes the smallest weight update necessary to correctly classify the current instance with a specified margin. When the prediction is correct with sufficient margin, the model is "passive" (no update). When the margin is violated, the model is "aggressive" (updates as much as needed).

ALMA (Approximate Large Margin Algorithm): An online algorithm that approximates the maximum margin classifier. It maintains a normalized weight vector and updates it based on margin violations with a diminishing learning rate, providing theoretical guarantees about the margin achieved.

Bayesian Linear Regression: Maintains a full posterior distribution over the weights rather than a point estimate. Each new observation updates the posterior via Bayes' rule, providing not just predictions but also uncertainty estimates. The prior and likelihood are both Gaussian, yielding closed-form posterior updates.

Softmax Regression: Extends logistic regression to the multi-class setting. Each class has its own weight vector, and predictions are made via the softmax function. Weights are updated incrementally using the cross-entropy loss gradient.

Usage

Use online linear models when:

You need a lightweight, interpretable model for streaming data.
You want well-understood convergence guarantees.
Memory and computation per instance must be bounded.
You need uncertainty quantification (Bayesian variant).

Theoretical Basis

Perceptron Update

For each instance (x, y) where y in {-1, +1}:
    hat{y} = sign(w^T x)
    if hat{y} != y:
        w = w + eta * y * x

Convergence: If data is linearly separable with margin $γ$ and $| | x | | \leq R$ , the Perceptron makes at most $(R / γ)^{2}$ mistakes.

Passive-Aggressive Update

For each instance (x, y):
    loss = max(0, 1 - y * w^T x)      # hinge loss
    tau = loss / (||x||^2 + 1/(2*C))   # PA-II variant
    w = w + tau * y * x

The PA family minimizes a trade-off between staying close to the current weight vector and satisfying the margin constraint on the current instance.

Bayesian Linear Regression

Prior: w ~ N(mu_0, Sigma_0)
For each instance (x, y):
    Posterior update (conjugate Gaussian):
    Sigma_n = (Sigma_{n-1}^{-1} + (1/sigma^2) * x * x^T)^{-1}
    mu_n = Sigma_n * (Sigma_{n-1}^{-1} * mu_{n-1} + (1/sigma^2) * y * x)
Prediction: hat{y} = mu_n^T x, with variance x^T Sigma_n x

Softmax Regression

For each instance (x, y) with K classes:
    score_k = w_k^T x  for each class k
    p_k = exp(score_k) / sum_j exp(score_j)    # softmax
    For each class k:
        w_k = w_k - eta * (p_k - I(y==k)) * x  # cross-entropy gradient

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment