Implementation:Microsoft Onnxruntime CUDA SoftmaxGrad

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for computing softmax and log-softmax gradients in the ONNX Runtime CUDA training framework.

Description

Implements the SoftmaxGrad operator for CUDA that computes gradients for both Softmax and LogSoftmax operations. Four kernel variants are registered: SoftmaxGrad, SoftmaxGrad_13 (opset 13+), LogSoftmaxGrad, and LogSoftmaxGrad_13. The opset-13 variants support axis selection and use transpose to move the softmax axis to the innermost dimension when needed. The implementation uses DispatchSoftmaxGradImpl to dispatch to SoftmaxGradImpl via cuDNN's softmax backward. For opset-13 variants with non-innermost axis, input tensors (dY, Y) are transposed before computation, and the result (dX) is transposed back. The axis permutation swaps the target axis with the last dimension. Supports float, double, MLFloat16, and BFloat16.

Usage

Invoked during the backward pass when the model uses Softmax or LogSoftmax layers.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/math/softmax_grad.cc
Lines: 1-118

Signature

class SoftmaxGrad : public CudaKernel {
  Status ComputeInternal(OpKernelContext* ctx) const;
};

Import

#include "orttraining/training_ops/cuda/math/softmax_grad.h"

I/O Contract

Inputs

Name	Type	Required	Description
dY	Tensor(T)	Yes	Upstream gradient
Y	Tensor(T)	Yes	Softmax output from forward pass

Outputs

Name	Type	Description
dX	Tensor(T)	Gradient with respect to softmax input

Usage Examples

// All four variants registered with same kernel class
REGISTER_SOFTMAX_GRAD_KERNEL(SoftmaxGrad)
REGISTER_SOFTMAX_GRAD_KERNEL(SoftmaxGrad_13)
REGISTER_SOFTMAX_GRAD_KERNEL(LogSoftmaxGrad)
REGISTER_SOFTMAX_GRAD_KERNEL(LogSoftmaxGrad_13)
// Supports float, double, MLFloat16, BFloat16

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment