Implementation:Microsoft Onnxruntime CPU SGD Adam
| Knowledge Sources | |
|---|---|
| Domains | Training, CPU_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for SGD and Adam (legacy single-tensor) optimizers on CPU in the ONNX Runtime training framework.
Description
This file implements two optimizer kernels:
SGDOptimizer: A simple stochastic gradient descent optimizer that computes NW = W - eta * G (new weight = old weight - learning_rate * gradient). It also optionally outputs the negative delta NG = -eta * G. Weights and gradients are updated in-place through aliasing.
AdamOptimizer: An Adam optimizer with decoupled weight decay, supporting two modes:
- Mode 0 (PyTorch): Bias correction on moments individually; weight decay before update.
update = (m1/alpha_correction) / (sqrt(m2/beta_correction) + epsilon) + lambda * W. - Mode 1 (HuggingFace): Bias correction applied to learning rate; weight decay after update.
step_size = lr * sqrt(beta_correction) / alpha_correction, thendelta = -step_size * m1 / denom - lr * lambda * (W - step_size * m1 / denom).
Both optimizers operate on single tensors (unlike AdamW which works on TensorSeq). The step counter is incremented after each update.
Usage
These are the legacy single-tensor optimizer kernels used when the training graph uses individual weight/gradient tensor pairs rather than the grouped TensorSeq pattern.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/optimizer/optimizers.cc
- Lines: 1-139
Signature
template <typename T>
Status SGDOptimizer<T>::Compute(OpKernelContext* ctx) const;
template <typename T>
Status AdamOptimizer<T>::Compute(OpKernelContext* ctx) const;
Import
#include "orttraining/orttraining/training_ops/cpu/optimizer/optimizers.h"
I/O Contract
Inputs (SGDOptimizer)
| Name | Type | Required | Description |
|---|---|---|---|
| ETA | Tensor(float) | Yes | Learning rate (scalar) |
| W | Tensor(float) | Yes | Current weights |
| G | Tensor(float) | Yes | Gradients |
Outputs (SGDOptimizer)
| Name | Type | Description |
|---|---|---|
| NW | Tensor(float) | Updated weights (in-place alias) |
| NG | Tensor(float) | Negative delta (in-place alias) |
Inputs (AdamOptimizer)
| Name | Type | Required | Description |
|---|---|---|---|
| ETA | Tensor(float) | Yes | Learning rate (scalar) |
| S | Tensor(int64) | Yes | Step counter |
| W | Tensor(float) | Yes | Current weights |
| G | Tensor(float) | Yes | Gradients |
| M1 | Tensor(float) | Yes | First moment estimates |
| M2 | Tensor(float) | Yes | Second moment estimates |
Outputs (AdamOptimizer)
| Name | Type | Description |
|---|---|---|
| NS | Tensor(int64) | Updated step counter |
| NM1 | Tensor(float) | Updated first moments |
| NM2 | Tensor(float) | Updated second moments |
| NW | Tensor(float) | Updated weights (optional) |
| NG | Tensor(float) | Update delta (optional) |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
SGDOptimizer, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder()
.Alias(1, 0) // Update weights in-place
.Alias(2, 1) // Update gradients in-place
.TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
SGDOptimizer<float>);
ONNX_OPERATOR_KERNEL_EX(
AdamOptimizer, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder()
.Alias(1, 0) // Update step count in-place
.Alias(2, 3) // Update weights in-place
.Alias(4, 1) // Update moment-1 in-place
.Alias(5, 2) // Update moment-2 in-place
.TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
.TypeConstraint("T2", DataTypeImpl::GetTensorType<int64_t>()),
AdamOptimizer<float>);