Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime CPU SGD Adam

From Leeroopedia


Knowledge Sources
Domains Training, CPU_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for SGD and Adam (legacy single-tensor) optimizers on CPU in the ONNX Runtime training framework.

Description

This file implements two optimizer kernels:

SGDOptimizer: A simple stochastic gradient descent optimizer that computes NW = W - eta * G (new weight = old weight - learning_rate * gradient). It also optionally outputs the negative delta NG = -eta * G. Weights and gradients are updated in-place through aliasing.

AdamOptimizer: An Adam optimizer with decoupled weight decay, supporting two modes:

  • Mode 0 (PyTorch): Bias correction on moments individually; weight decay before update. update = (m1/alpha_correction) / (sqrt(m2/beta_correction) + epsilon) + lambda * W.
  • Mode 1 (HuggingFace): Bias correction applied to learning rate; weight decay after update. step_size = lr * sqrt(beta_correction) / alpha_correction, then delta = -step_size * m1 / denom - lr * lambda * (W - step_size * m1 / denom).

Both optimizers operate on single tensors (unlike AdamW which works on TensorSeq). The step counter is incremented after each update.

Usage

These are the legacy single-tensor optimizer kernels used when the training graph uses individual weight/gradient tensor pairs rather than the grouped TensorSeq pattern.

Code Reference

Source Location

Signature

template <typename T>
Status SGDOptimizer<T>::Compute(OpKernelContext* ctx) const;

template <typename T>
Status AdamOptimizer<T>::Compute(OpKernelContext* ctx) const;

Import

#include "orttraining/orttraining/training_ops/cpu/optimizer/optimizers.h"

I/O Contract

Inputs (SGDOptimizer)

Name Type Required Description
ETA Tensor(float) Yes Learning rate (scalar)
W Tensor(float) Yes Current weights
G Tensor(float) Yes Gradients

Outputs (SGDOptimizer)

Name Type Description
NW Tensor(float) Updated weights (in-place alias)
NG Tensor(float) Negative delta (in-place alias)

Inputs (AdamOptimizer)

Name Type Required Description
ETA Tensor(float) Yes Learning rate (scalar)
S Tensor(int64) Yes Step counter
W Tensor(float) Yes Current weights
G Tensor(float) Yes Gradients
M1 Tensor(float) Yes First moment estimates
M2 Tensor(float) Yes Second moment estimates

Outputs (AdamOptimizer)

Name Type Description
NS Tensor(int64) Updated step counter
NM1 Tensor(float) Updated first moments
NM2 Tensor(float) Updated second moments
NW Tensor(float) Updated weights (optional)
NG Tensor(float) Update delta (optional)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    SGDOptimizer, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(1, 0)  // Update weights in-place
        .Alias(2, 1)  // Update gradients in-place
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    SGDOptimizer<float>);

ONNX_OPERATOR_KERNEL_EX(
    AdamOptimizer, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(1, 0)  // Update step count in-place
        .Alias(2, 3)  // Update weights in-place
        .Alias(4, 1)  // Update moment-1 in-place
        .Alias(5, 2)  // Update moment-2 in-place
        .TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("T2", DataTypeImpl::GetTensorType<int64_t>()),
    AdamOptimizer<float>);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment