Implementation:Microsoft Onnxruntime CPU SoftmaxCrossEntropyLoss

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CPU_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for computing the ONNX standard SoftmaxCrossEntropyLoss and its gradient on CPU in the ONNX Runtime training framework.

Description

This file implements the SoftmaxCrossEntropyLoss forward kernel (ONNX opset 12 and 13) and the SoftmaxCrossEntropyLossGrad backward kernel. Unlike the simpler cross entropy kernels, this implementation handles multi-dimensional inputs by transposing logits from [N, C, D1, D2...Dk] to [N, D1, D2...Dk, C] before computing softmax. It supports an ignore_index parameter to exclude certain labels from the loss, optional per-class weights, and three reduction modes: NONE, MEAN, and SUM. The gradient kernel uses parallel-for execution via the thread pool for efficient backpropagation across all elements. Internal variants (SoftmaxCrossEntropyLossInternal and SoftmaxCrossEntropyLossInternalGrad) are also registered. The gradient kernel supports an optional bias input that is added element-wise to the output gradient.

Usage

This kernel is invoked during training when an ONNX SoftmaxCrossEntropyLoss node (opset 12 or 13) is present in the computation graph. The forward pass computes the classification loss, while the gradient kernel propagates loss gradients back through the softmax layer during backpropagation.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc
Lines: 1-438

Signature

void GetNDCFromLogitAndLabelShape(const TensorShape& logit_shape,
                                  const TensorShape& label_shape,
                                  int64_t& N_D, int64_t& C);

void VerifyLogitWeightAndLabelShape(const TensorShape& logit_shape,
                                    const TensorShape& label_shape,
                                    const TensorShape* weight_shape);

void GetPermutationAndShape(bool ncd_to_ndc, const TensorShape& tensor_shape,
                            TensorShapeVector& new_shape,
                            std::vector<size_t>& permutations);

template <typename T1, typename T2>
Status SoftmaxCrossEntropyLoss<T1, T2>::Compute(OpKernelContext* context) const;

template <typename T1, typename T2>
Status SoftmaxCrossEntropyLossGrad<T1, T2>::Compute(OpKernelContext* context) const;

Import

#include "orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.h"

I/O Contract

Inputs (SoftmaxCrossEntropyLoss)

Name	Type	Required	Description
logit	Tensor(float)	Yes	Raw logits [N, C, D1, D2...Dk]
label	Tensor(int32/int64)	Yes	Sparse integer labels [N, D1, D2...Dk]
weight	Tensor(float)	No	Optional per-class weights [C]
ignore_index	Tensor(int64)	No	Optional scalar index to ignore in loss computation

Outputs (SoftmaxCrossEntropyLoss)

Name	Type	Description
loss	Tensor(float)	Loss value (scalar if reduced, or [N, D1...Dk] if NONE)
log_prob	Tensor(float)	Log probabilities [N, C, D1, D2...Dk] (optional)

Inputs (SoftmaxCrossEntropyLossGrad)

Name	Type	Required	Description
dY	Tensor(float)	Yes	Upstream gradient
log_prob	Tensor(float)	Yes	Log probabilities from forward pass
label	Tensor(int32/int64)	Yes	Sparse integer labels
weight	Tensor(float)	No	Optional per-class weights
ignore_index	Tensor(int64)	No	Optional scalar index to ignore
bias	Tensor(float)	No	Optional bias to add to the output gradient

Outputs (SoftmaxCrossEntropyLossGrad)

Name	Type	Description
d_logit	Tensor(float)	Gradient w.r.t. input logits

Usage Examples

// Kernel registration for SoftmaxCrossEntropyLoss opset 13
ONNX_OPERATOR_TWO_TYPED_KERNEL_EX(
    SoftmaxCrossEntropyLoss,
    kOnnxDomain,
    13,
    float,
    int64_t,
    kCpuExecutionProvider,
    KernelDefBuilder()
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("Tind", DataTypeImpl::GetTensorType<int64_t>()),
    SoftmaxCrossEntropyLoss<float, int64_t>);

Related Pages

Environment:Microsoft_Onnxruntime_CPU_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment