Overview
Concrete tool for computing the ONNX standard SoftmaxCrossEntropyLoss and its gradient on CPU in the ONNX Runtime training framework.
Description
This file implements the SoftmaxCrossEntropyLoss forward kernel (ONNX opset 12 and 13) and the SoftmaxCrossEntropyLossGrad backward kernel. Unlike the simpler cross entropy kernels, this implementation handles multi-dimensional inputs by transposing logits from [N, C, D1, D2...Dk] to [N, D1, D2...Dk, C] before computing softmax. It supports an ignore_index parameter to exclude certain labels from the loss, optional per-class weights, and three reduction modes: NONE, MEAN, and SUM. The gradient kernel uses parallel-for execution via the thread pool for efficient backpropagation across all elements. Internal variants (SoftmaxCrossEntropyLossInternal and SoftmaxCrossEntropyLossInternalGrad) are also registered. The gradient kernel supports an optional bias input that is added element-wise to the output gradient.
Usage
This kernel is invoked during training when an ONNX SoftmaxCrossEntropyLoss node (opset 12 or 13) is present in the computation graph. The forward pass computes the classification loss, while the gradient kernel propagates loss gradients back through the softmax layer during backpropagation.
Code Reference
Source Location
Signature
void GetNDCFromLogitAndLabelShape(const TensorShape& logit_shape,
const TensorShape& label_shape,
int64_t& N_D, int64_t& C);
void VerifyLogitWeightAndLabelShape(const TensorShape& logit_shape,
const TensorShape& label_shape,
const TensorShape* weight_shape);
void GetPermutationAndShape(bool ncd_to_ndc, const TensorShape& tensor_shape,
TensorShapeVector& new_shape,
std::vector<size_t>& permutations);
template <typename T1, typename T2>
Status SoftmaxCrossEntropyLoss<T1, T2>::Compute(OpKernelContext* context) const;
template <typename T1, typename T2>
Status SoftmaxCrossEntropyLossGrad<T1, T2>::Compute(OpKernelContext* context) const;
Import
#include "orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.h"
I/O Contract
Inputs (SoftmaxCrossEntropyLoss)
| Name |
Type |
Required |
Description
|
| logit |
Tensor(float) |
Yes |
Raw logits [N, C, D1, D2...Dk]
|
| label |
Tensor(int32/int64) |
Yes |
Sparse integer labels [N, D1, D2...Dk]
|
| weight |
Tensor(float) |
No |
Optional per-class weights [C]
|
| ignore_index |
Tensor(int64) |
No |
Optional scalar index to ignore in loss computation
|
Outputs (SoftmaxCrossEntropyLoss)
| Name |
Type |
Description
|
| loss |
Tensor(float) |
Loss value (scalar if reduced, or [N, D1...Dk] if NONE)
|
| log_prob |
Tensor(float) |
Log probabilities [N, C, D1, D2...Dk] (optional)
|
Inputs (SoftmaxCrossEntropyLossGrad)
| Name |
Type |
Required |
Description
|
| dY |
Tensor(float) |
Yes |
Upstream gradient
|
| log_prob |
Tensor(float) |
Yes |
Log probabilities from forward pass
|
| label |
Tensor(int32/int64) |
Yes |
Sparse integer labels
|
| weight |
Tensor(float) |
No |
Optional per-class weights
|
| ignore_index |
Tensor(int64) |
No |
Optional scalar index to ignore
|
| bias |
Tensor(float) |
No |
Optional bias to add to the output gradient
|
Outputs (SoftmaxCrossEntropyLossGrad)
| Name |
Type |
Description
|
| d_logit |
Tensor(float) |
Gradient w.r.t. input logits
|
Usage Examples
// Kernel registration for SoftmaxCrossEntropyLoss opset 13
ONNX_OPERATOR_TWO_TYPED_KERNEL_EX(
SoftmaxCrossEntropyLoss,
kOnnxDomain,
13,
float,
int64_t,
kCpuExecutionProvider,
KernelDefBuilder()
.TypeConstraint("T", DataTypeImpl::GetTensorType<float>())
.TypeConstraint("Tind", DataTypeImpl::GetTensorType<int64_t>()),
SoftmaxCrossEntropyLoss<float, int64_t>);
Related Pages