Implementation:Microsoft Onnxruntime CUDA SoftmaxCrossEntropyLoss

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for computing softmax cross-entropy loss with sparse labels (integer indices) and its gradient in the ONNX Runtime CUDA training framework.

Description

Implements the ONNX SoftmaxCrossEntropyLoss operator and its gradient SoftmaxCrossEntropyLossGrad for CUDA. The forward pass computes log-softmax of logits, then calculates the weighted negative log-likelihood loss using integer class labels. It supports per-class weights, an ignore index for masking specific labels, and three reduction modes (None, Mean, Sum). For multi-dimensional inputs (N, C, D1..Dk), the implementation transposes logits from [N, C, D1..Dk] to [N, D1..Dk, C] before computing softmax. The gradient pass back-propagates through the softmax and loss, with optional bias addition. Internal variants (SoftmaxCrossEntropyLossInternal) support mixed-precision output types. Template instantiations cover float, MLFloat16, and BFloat16 with int64_t labels.

Usage

Invoked during training forward and backward passes whenever the model uses softmax cross-entropy loss with sparse (integer) labels, the standard classification loss function in deep learning.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.cc
Lines: 1-346

Signature

template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLoss<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;

template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLossGrad<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;

OrtValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator);

Import

#include "orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.h"

I/O Contract

Inputs

Name	Type	Required	Description
logit	Tensor(T)	Yes	Input logits with shape [N, C] or [N, C, D1..Dk]
label	Tensor(TLabel)	Yes	Integer class labels with shape [N] or [N, D1..Dk]
weight	Tensor(T)	No	Per-class weights with shape [C]
ignore_index	Tensor(int64_t)	No	Scalar specifying a label value to ignore in loss computation

Outputs

Name	Type	Description
loss	Tensor(TOut)	Scalar loss (for Mean/Sum reduction) or per-sample loss [N] (for None)
log_prob	Tensor(TOut)	Log-softmax probabilities with same shape as logit (optional)

Usage Examples

// Registered for ONNX domain versions 12 and 13 with CUDA execution provider
// Forward: SoftmaxCrossEntropyLoss<float, int64_t, float>
// Gradient: SoftmaxCrossEntropyLossGrad<float, int64_t, float>
// Internal mixed precision variant:
// SoftmaxCrossEntropyLossInternal<MLFloat16, int64_t, float>

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment