Implementation:Microsoft Onnxruntime CPU OpGradients
| Knowledge Sources | |
|---|---|
| Domains | Training, CPU_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for computing basic activation and operation gradients (Relu, Softmax, LogSoftmax, Sigmoid, Tanh, QuickGelu, LeakyRelu) on CPU in the ONNX Runtime training framework.
Description
This file implements gradient kernels for fundamental neural network operations:
ReluGrad: Passes the upstream gradient where X > 0, zeros elsewhere: dX = (X > 0) ? dY : 0.
SoftmaxGrad / SoftmaxGrad_13: Computes the Jacobian-vector product for softmax: dX = Y * (dY - sum(Y * dY)). Supports opset 13 axis transposition. The LogSoftmax variant uses: dX = dY - sum(dY) * exp(Y).
SigmoidGrad: dX = dY * Y * (1 - Y).
TanhGrad: dX = dY * (1 - Y^2).
QuickGeluGrad: Uses the logistic sigmoid of alpha * X and computes: dX = dY * sigmoid(alpha*X) * (1 + alpha*X*(1 - sigmoid(alpha*X))). Uses MlasComputeLogistic for efficient sigmoid computation and parallel execution via thread pool.
LeakyReluGrad: dX = (Y > 0) ? dY : alpha * dY.
All kernels are registered under kMSDomain opset 1 for float type.
Usage
These kernels are invoked during the backward pass whenever their corresponding activation or operation nodes are present in the training graph. They represent the most commonly used activation gradients in deep learning.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/op_gradients.cc
- Lines: 1-283
Signature
template <typename T>
Status ReluGrad<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status SoftmaxGrad<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status SigmoidGrad<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status TanhGrad<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status QuickGeluGrad<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status LeakyReluGrad<T>::Compute(OpKernelContext* context) const;
Import
#include "orttraining/orttraining/training_ops/cpu/op_gradients.h"
I/O Contract
Inputs (Common Pattern)
| Name | Type | Required | Description |
|---|---|---|---|
| dY | Tensor(float) | Yes | Upstream gradient |
| X_or_Y | Tensor(float) | Yes | Forward input (ReluGrad, QuickGeluGrad) or forward output (others) |
Outputs
| Name | Type | Description |
|---|---|---|
| dX | Tensor(float) | Gradient w.r.t. input |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
ReluGrad, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
ReluGrad<float>);
ONNX_OPERATOR_KERNEL_EX(
SoftmaxGrad, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
SoftmaxGrad<float>);
ONNX_OPERATOR_KERNEL_EX(
SigmoidGrad, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
SigmoidGrad<float>);