Implementation:Pyro ppl Pyro AdagradRMSProp
Appearance
| Property | Value |
|---|---|
| Module | pyro.optim.adagrad_rmsprop
|
| Source | pyro/optim/adagrad_rmsprop.py |
| Lines | 88 |
| Classes | AdagradRMSProp
|
| Parent Class | torch.optim.optimizer.Optimizer
|
| Dependencies | torch
|
Overview
This module implements the AdagradRMSProp optimizer, a hybrid of Adagrad and RMSProp designed specifically for automatic differentiation variational inference (ADVI). The algorithm follows Equations 10 and 11 in Kucukelbir et al. (2017), "Automatic Differentiation Variational Inference."
The optimizer combines:
- RMSProp-style exponential moving average of squared gradients (controlled by momentum parameter
t). - Adagrad-style step size decay proportional to
step^(-0.5 + delta).
The update rule is: param = param - lr * grad / (1 + sqrt(ema_sq_grad)) where lr = eta * step^(-0.5 + delta).
Code Reference
Class: AdagradRMSProp
Constructor:
params: Iterable of parameters or parameter groups.eta(float, default 1.0): Step size scale.delta(float, default 1e-16): Exponent modulator for step size scaling.t(float, default 0.1): Momentum parameter for exponential moving average of squared gradients.
State initialization:
Each parameter starts with step=0 and sum=zeros_like(param).
Methods:
share_memory(): Moves optimizer state to shared memory (for multiprocessing).step(closure=None): Performs a single optimization step:- On step 1, initializes
sum = grad^2. - On subsequent steps:
sum = (1 - t) * sum + t * grad^2. - Learning rate:
lr = eta * step^(-0.5 + delta). - Update:
param -= lr * grad / (1 + sqrt(sum)).
- On step 1, initializes
I/O Contract
| Method | Input | Output |
|---|---|---|
__init__ |
Parameters, eta, delta, t |
AdagradRMSProp instance
|
step(closure) |
Optional closure | Optional loss value |
Usage Examples
import torch
from pyro.optim.adagrad_rmsprop import AdagradRMSProp
# Direct use as a PyTorch optimizer
model = torch.nn.Linear(10, 1)
optimizer = AdagradRMSProp(model.parameters(), eta=0.1, t=0.1)
for epoch in range(100):
x = torch.randn(32, 10)
loss = model(x).pow(2).mean()
optimizer.zero_grad()
loss.backward()
optimizer.step()
# As a Pyro optimizer for SVI
import pyro
from pyro.optim import PyroOptim
optim = PyroOptim(AdagradRMSProp, {"eta": 0.1, "t": 0.1})
Related Pages
- Pyro_ppl_Pyro_DCTAdam -- DCT-augmented Adam optimizer
- Pyro_ppl_Pyro_MultiOptimizer -- Higher-order optimizer framework
- Pyro_ppl_Pyro_PyroLRScheduler -- Learning rate scheduling wrapper
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment