Implementation:Online ml River Optim Averager
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Optimization |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Averager is a wrapper optimizer that returns averaged weights from any base stochastic gradient descent optimizer.
Description
The Averager optimizer wraps another optimizer and maintains a running average of the weights produced during training. Unlike traditional weight averaging which typically occurs only at the end of training, this implementation continuously returns the current averaged weights. The averaging process can be delayed by a specified number of iterations using the start parameter, allowing the model to train normally for an initial period before averaging begins. This technique helps reduce variance in the weights and often leads to better generalization. The averaged weights represent a more stable solution by smoothing out the oscillations that can occur during stochastic gradient descent.
Usage
Import from river.optim and wrap any base optimizer with Averager. Useful for improving stability and generalization of any SGD-based optimizer.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/optim/average.py
Signature
class Averager(optim.base.Optimizer):
def __init__(self, optimizer: optim.base.Optimizer, start: int = 0):
...
Import
from river import optim
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| optimizer | optim.base.Optimizer | Yes | Base optimizer whose weights will be averaged |
| start | int | No (default=0) | Number of iterations to wait before starting the average |
Outputs
| Name | Type | Description |
|---|---|---|
| optimizer | Averager | Wrapped optimizer that returns averaged weights |
Usage Examples
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import optim
from river import preprocessing
# Wrap SGD with averaging, start after 100 iterations
optimizer = optim.Averager(optim.SGD(0.01), start=100)
# Use with a linear model
dataset = datasets.Phishing()
model = (
preprocessing.StandardScaler() |
linear_model.LogisticRegression(optimizer)
)
metric = metrics.F1()
# Evaluate
score = evaluate.progressive_val_score(dataset, model, metric)
print(score) # F1: 87.97%
# Can wrap any optimizer
optimizer = optim.Averager(optim.Adam(), start=50)
model = linear_model.LogisticRegression(optimizer)
# Start averaging immediately
optimizer = optim.Averager(optim.Momentum(lr=0.01))
model = linear_model.LogisticRegression(optimizer)
# Useful for stabilizing training
base_optim = optim.SGD(lr=0.1) # Higher learning rate
averaged_optim = optim.Averager(base_optim, start=200)
model = linear_model.LogisticRegression(averaged_optim)