Implementation:Online ml River Optim NesterovMomentum
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Optimization |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Nesterov Momentum is an improved variant of momentum that computes gradients at the look-ahead position for better convergence properties.
Description
Nesterov Momentum, also known as Nesterov Accelerated Gradient (NAG), improves upon standard momentum by computing the gradient not at the current position, but at the approximate future position where momentum would take us. This look-ahead mechanism allows the algorithm to make more informed updates by incorporating information about where we're heading. The implementation uses the look_ahead method to temporarily move weights forward by the momentum term before gradient computation, then steps back and applies the full update. This subtle change often results in faster convergence and better performance than standard momentum, especially near local minima where it can slow down more gracefully.
Usage
Import from river.optim and use as an optimizer in any River model. Prefer over standard Momentum when you want improved convergence properties.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/optim/nesterov.py
Signature
class NesterovMomentum(optim.base.Optimizer):
def __init__(self, lr=0.1, rho=0.9):
...
def look_ahead(self, w):
...
Import
from river import optim
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| lr | float | No (default=0.1) | Learning rate |
| rho | float | No (default=0.9) | Momentum parameter (fraction of previous update to retain) |
Outputs
| Name | Type | Description |
|---|---|---|
| optimizer | NesterovMomentum | Configured optimizer instance ready for model training |
Usage Examples
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import optim
from river import preprocessing
# Create Nesterov Momentum optimizer
optimizer = optim.NesterovMomentum()
# Use with a linear model
dataset = datasets.Phishing()
model = (
preprocessing.StandardScaler() |
linear_model.LogisticRegression(optimizer)
)
metric = metrics.F1()
# Evaluate
score = evaluate.progressive_val_score(dataset, model, metric)
print(score) # F1: 84.22%
# Custom parameters
optimizer = optim.NesterovMomentum(lr=0.05, rho=0.95)
model = linear_model.LogisticRegression(optimizer)
# Compare with standard Momentum
momentum = optim.Momentum(lr=0.1, rho=0.9)
nesterov = optim.NesterovMomentum(lr=0.1, rho=0.9)
model1 = linear_model.LogisticRegression(momentum)
model2 = linear_model.LogisticRegression(nesterov)
# Typically converges faster than standard momentum
optimizer = optim.NesterovMomentum(lr=0.01, rho=0.9)
model = linear_model.LinearRegression(optimizer)