Principle:Scikit learn Scikit learn Online Learning
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Optimization |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Online learning algorithms update model parameters incrementally as data arrives, rather than requiring access to the entire dataset at once.
Description
Online learning methods process training examples one at a time (or in small batches), updating model parameters after each observation. This approach solves the scalability problem inherent in batch learning when datasets are too large to fit in memory or when data arrives as a stream. Online algorithms are also well-suited for non-stationary environments where the data distribution changes over time. These methods form a key component of large-scale machine learning and real-time adaptive systems.
Usage
Use online learning algorithms when working with very large datasets that cannot be loaded entirely into memory, when data arrives in a streaming fashion, or when the underlying data distribution evolves over time. Stochastic Gradient Descent (SGD) is the most versatile choice, supporting many loss functions and penalty terms for both classification and regression. Passive-Aggressive algorithms are useful when you want margin-based updates with an aggressiveness parameter controlling the trade-off between fitting new examples and staying close to the current model. The Perceptron is suitable for linearly separable problems and serves as a simple, efficient baseline for online classification.
Theoretical Basis
Stochastic Gradient Descent (SGD) updates parameters using a single sample (or mini-batch) gradient:
where is the learning rate at step and is the loss function. Common loss functions include:
- Hinge loss (for classification):
- Log loss (logistic regression):
- Squared loss (regression):
SGD converges to the optimal solution under standard conditions on the learning rate schedule (e.g., and ).
Passive-Aggressive (PA) algorithms solve a constrained optimization at each step:
The update is passive when the current model correctly classifies the example with sufficient margin, and aggressive when it does not, making the minimal change necessary to satisfy the constraint. The PA-I and PA-II variants introduce a regularization parameter to control aggressiveness.
Perceptron is the simplest online linear classifier. For a misclassified example:
The Perceptron convergence theorem guarantees convergence to a separating hyperplane in a finite number of steps if the data is linearly separable with margin .