Implementation:Microsoft Onnxruntime CPU OpGradients

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CPU_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for computing basic activation and operation gradients (Relu, Softmax, LogSoftmax, Sigmoid, Tanh, QuickGelu, LeakyRelu) on CPU in the ONNX Runtime training framework.

Description

This file implements gradient kernels for fundamental neural network operations:

ReluGrad: Passes the upstream gradient where X > 0, zeros elsewhere: dX = (X > 0) ? dY : 0.

SoftmaxGrad / SoftmaxGrad_13: Computes the Jacobian-vector product for softmax: dX = Y * (dY - sum(Y * dY)). Supports opset 13 axis transposition. The LogSoftmax variant uses: dX = dY - sum(dY) * exp(Y).

SigmoidGrad: dX = dY * Y * (1 - Y).

TanhGrad: dX = dY * (1 - Y^2).

QuickGeluGrad: Uses the logistic sigmoid of alpha * X and computes: dX = dY * sigmoid(alpha*X) * (1 + alpha*X*(1 - sigmoid(alpha*X))). Uses MlasComputeLogistic for efficient sigmoid computation and parallel execution via thread pool.

LeakyReluGrad: dX = (Y > 0) ? dY : alpha * dY.

All kernels are registered under kMSDomain opset 1 for float type.

Usage

These kernels are invoked during the backward pass whenever their corresponding activation or operation nodes are present in the training graph. They represent the most commonly used activation gradients in deep learning.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cpu/op_gradients.cc
Lines: 1-283

Signature

template <typename T>
Status ReluGrad<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status SoftmaxGrad<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status SigmoidGrad<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status TanhGrad<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status QuickGeluGrad<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status LeakyReluGrad<T>::Compute(OpKernelContext* context) const;

Import

#include "orttraining/orttraining/training_ops/cpu/op_gradients.h"

I/O Contract

Inputs (Common Pattern)

Name	Type	Required	Description
dY	Tensor(float)	Yes	Upstream gradient
X_or_Y	Tensor(float)	Yes	Forward input (ReluGrad, QuickGeluGrad) or forward output (others)

Outputs

Name	Type	Description
dX	Tensor(float)	Gradient w.r.t. input

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    ReluGrad, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    ReluGrad<float>);

ONNX_OPERATOR_KERNEL_EX(
    SoftmaxGrad, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    SoftmaxGrad<float>);

ONNX_OPERATOR_KERNEL_EX(
    SigmoidGrad, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    SigmoidGrad<float>);

Related Pages

Environment:Microsoft_Onnxruntime_CPU_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment