Implementation:Microsoft Onnxruntime CUDA SoftmaxCrossEntropyLoss
| Knowledge Sources | |
|---|---|
| Domains | Training, CUDA_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for computing softmax cross-entropy loss with sparse labels (integer indices) and its gradient in the ONNX Runtime CUDA training framework.
Description
Implements the ONNX SoftmaxCrossEntropyLoss operator and its gradient SoftmaxCrossEntropyLossGrad for CUDA. The forward pass computes log-softmax of logits, then calculates the weighted negative log-likelihood loss using integer class labels. It supports per-class weights, an ignore index for masking specific labels, and three reduction modes (None, Mean, Sum). For multi-dimensional inputs (N, C, D1..Dk), the implementation transposes logits from [N, C, D1..Dk] to [N, D1..Dk, C] before computing softmax. The gradient pass back-propagates through the softmax and loss, with optional bias addition. Internal variants (SoftmaxCrossEntropyLossInternal) support mixed-precision output types. Template instantiations cover float, MLFloat16, and BFloat16 with int64_t labels.
Usage
Invoked during training forward and backward passes whenever the model uses softmax cross-entropy loss with sparse (integer) labels, the standard classification loss function in deep learning.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.cc
- Lines: 1-346
Signature
template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLoss<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;
template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLossGrad<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;
OrtValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator);
Import
#include "orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| logit | Tensor(T) | Yes | Input logits with shape [N, C] or [N, C, D1..Dk] |
| label | Tensor(TLabel) | Yes | Integer class labels with shape [N] or [N, D1..Dk] |
| weight | Tensor(T) | No | Per-class weights with shape [C] |
| ignore_index | Tensor(int64_t) | No | Scalar specifying a label value to ignore in loss computation |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | Tensor(TOut) | Scalar loss (for Mean/Sum reduction) or per-sample loss [N] (for None) |
| log_prob | Tensor(TOut) | Log-softmax probabilities with same shape as logit (optional) |
Usage Examples
// Registered for ONNX domain versions 12 and 13 with CUDA execution provider
// Forward: SoftmaxCrossEntropyLoss<float, int64_t, float>
// Gradient: SoftmaxCrossEntropyLossGrad<float, int64_t, float>
// Internal mixed precision variant:
// SoftmaxCrossEntropyLossInternal<MLFloat16, int64_t, float>