Implementation:Microsoft Onnxruntime CPU SGD Adam

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CPU_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for SGD and Adam (legacy single-tensor) optimizers on CPU in the ONNX Runtime training framework.

Description

This file implements two optimizer kernels:

SGDOptimizer: A simple stochastic gradient descent optimizer that computes NW = W - eta * G (new weight = old weight - learning_rate * gradient). It also optionally outputs the negative delta NG = -eta * G. Weights and gradients are updated in-place through aliasing.

AdamOptimizer: An Adam optimizer with decoupled weight decay, supporting two modes:

Mode 0 (PyTorch): Bias correction on moments individually; weight decay before update. update = (m1/alpha_correction) / (sqrt(m2/beta_correction) + epsilon) + lambda * W.
Mode 1 (HuggingFace): Bias correction applied to learning rate; weight decay after update. step_size = lr * sqrt(beta_correction) / alpha_correction, then delta = -step_size * m1 / denom - lr * lambda * (W - step_size * m1 / denom).

Both optimizers operate on single tensors (unlike AdamW which works on TensorSeq). The step counter is incremented after each update.

Usage

These are the legacy single-tensor optimizer kernels used when the training graph uses individual weight/gradient tensor pairs rather than the grouped TensorSeq pattern.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cpu/optimizer/optimizers.cc
Lines: 1-139

Signature

template <typename T>
Status SGDOptimizer<T>::Compute(OpKernelContext* ctx) const;

template <typename T>
Status AdamOptimizer<T>::Compute(OpKernelContext* ctx) const;

Import

#include "orttraining/orttraining/training_ops/cpu/optimizer/optimizers.h"

I/O Contract

Inputs (SGDOptimizer)

Name	Type	Required	Description
ETA	Tensor(float)	Yes	Learning rate (scalar)
W	Tensor(float)	Yes	Current weights
G	Tensor(float)	Yes	Gradients

Outputs (SGDOptimizer)

Name	Type	Description
NW	Tensor(float)	Updated weights (in-place alias)
NG	Tensor(float)	Negative delta (in-place alias)

Inputs (AdamOptimizer)

Name	Type	Required	Description
ETA	Tensor(float)	Yes	Learning rate (scalar)
S	Tensor(int64)	Yes	Step counter
W	Tensor(float)	Yes	Current weights
G	Tensor(float)	Yes	Gradients
M1	Tensor(float)	Yes	First moment estimates
M2	Tensor(float)	Yes	Second moment estimates

Outputs (AdamOptimizer)

Name	Type	Description
NS	Tensor(int64)	Updated step counter
NM1	Tensor(float)	Updated first moments
NM2	Tensor(float)	Updated second moments
NW	Tensor(float)	Updated weights (optional)
NG	Tensor(float)	Update delta (optional)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    SGDOptimizer, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(1, 0)  // Update weights in-place
        .Alias(2, 1)  // Update gradients in-place
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    SGDOptimizer<float>);

ONNX_OPERATOR_KERNEL_EX(
    AdamOptimizer, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(1, 0)  // Update step count in-place
        .Alias(2, 3)  // Update weights in-place
        .Alias(4, 1)  // Update moment-1 in-place
        .Alias(5, 2)  // Update moment-2 in-place
        .TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("T2", DataTypeImpl::GetTensorType<int64_t>()),
    AdamOptimizer<float>);

Related Pages

Environment:Microsoft_Onnxruntime_CPU_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment