Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CUDA SoftmaxCrossEntropyLoss

From Leeroopedia
Revision as of 15:45, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_Onnxruntime_CUDA_SoftmaxCrossEntropyLoss.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Training, CUDA_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for computing softmax cross-entropy loss with sparse labels (integer indices) and its gradient in the ONNX Runtime CUDA training framework.

Description

Implements the ONNX SoftmaxCrossEntropyLoss operator and its gradient SoftmaxCrossEntropyLossGrad for CUDA. The forward pass computes log-softmax of logits, then calculates the weighted negative log-likelihood loss using integer class labels. It supports per-class weights, an ignore index for masking specific labels, and three reduction modes (None, Mean, Sum). For multi-dimensional inputs (N, C, D1..Dk), the implementation transposes logits from [N, C, D1..Dk] to [N, D1..Dk, C] before computing softmax. The gradient pass back-propagates through the softmax and loss, with optional bias addition. Internal variants (SoftmaxCrossEntropyLossInternal) support mixed-precision output types. Template instantiations cover float, MLFloat16, and BFloat16 with int64_t labels.

Usage

Invoked during training forward and backward passes whenever the model uses softmax cross-entropy loss with sparse (integer) labels, the standard classification loss function in deep learning.

Code Reference

Source Location

Signature

template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLoss<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;

template <typename T, typename TLabel, typename TOut>
Status SoftmaxCrossEntropyLossGrad<T, TLabel, TOut>::ComputeInternal(OpKernelContext* ctx) const;

OrtValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator);

Import

#include "orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.h"

I/O Contract

Inputs

Name Type Required Description
logit Tensor(T) Yes Input logits with shape [N, C] or [N, C, D1..Dk]
label Tensor(TLabel) Yes Integer class labels with shape [N] or [N, D1..Dk]
weight Tensor(T) No Per-class weights with shape [C]
ignore_index Tensor(int64_t) No Scalar specifying a label value to ignore in loss computation

Outputs

Name Type Description
loss Tensor(TOut) Scalar loss (for Mean/Sum reduction) or per-sample loss [N] (for None)
log_prob Tensor(TOut) Log-softmax probabilities with same shape as logit (optional)

Usage Examples

// Registered for ONNX domain versions 12 and 13 with CUDA execution provider
// Forward: SoftmaxCrossEntropyLoss<float, int64_t, float>
// Gradient: SoftmaxCrossEntropyLossGrad<float, int64_t, float>
// Internal mixed precision variant:
// SoftmaxCrossEntropyLossInternal<MLFloat16, int64_t, float>

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment