Implementation:Microsoft Onnxruntime CrossEntropy Declarations
| Knowledge Sources | |
|---|---|
| Domains | Training, Operators, Loss |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Declares CPU kernel classes for cross-entropy loss functions and their gradients used in ORT Training, including SoftmaxCrossEntropy and SparseSoftmaxCrossEntropy variants.
Description
The `cross_entropy.h` header declares the cross-entropy loss operator kernels for the ONNX Runtime training operators (in the `onnxruntime::contrib` namespace). These are CPU implementations used during training for computing loss and backpropagating gradients.
- LossBase: Abstract base class extending `OpKernel`. Extracts the `reduction` attribute (mean, sum, or none) from the operator info and stores it as a `ReductionType` enum. All loss kernels inherit from this.
- ComputeShareSoftmaxCrossEntropyCPU<T>: A free function template that computes the shared softmax and log-probability computation. Takes raw logit data, computes shifted logits (for numerical stability), and produces log probabilities. Parameters: `n` (batch size), `d` (class count), `nd` (total elements), and pre-allocated buffers for `shifted_logit` and `log_prob_data`.
- SoftmaxCrossEntropy<T>: Computes softmax cross-entropy loss where both the predictions (logits) and targets are dense tensors. The `Compute` method applies softmax to logits and computes cross-entropy with the target distribution. Non-copyable, non-movable.
- SoftmaxCrossEntropyGrad<T>: Computes the gradient of the softmax cross-entropy loss with respect to the logits. Non-copyable, non-movable.
- SparseSoftmaxCrossEntropy<T>: Computes softmax cross-entropy loss where the targets are sparse (class indices rather than one-hot vectors). More memory-efficient for classification tasks with many classes. Non-copyable, non-movable.
- SparseSoftmaxCrossEntropyGrad<T>: Computes the gradient of the sparse softmax cross-entropy loss. Non-copyable, non-movable.
All kernel classes are templated on the data type `T` (typically `float`) and implement the `Compute(OpKernelContext*)` method.
Usage
These kernels are registered as ORT contrib operators and are automatically invoked during training graph execution when the training graph contains SoftmaxCrossEntropy or SparseSoftmaxCrossEntropy nodes.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/loss/cross_entropy.h
- Lines: 1-81
Signature
namespace onnxruntime::contrib {
class LossBase : public OpKernel {
public:
explicit LossBase(const OpKernelInfo& info);
protected:
ReductionType reduction_;
};
template <typename T>
void ComputeShareSoftmaxCrossEntropyCPU(const int n, const int d,
const Eigen::Index nd, const T* logit_data,
T* shifted_logit, T* log_prob_data);
template <typename T>
class SoftmaxCrossEntropy final : public LossBase {
public:
explicit SoftmaxCrossEntropy(const OpKernelInfo& info);
Status Compute(OpKernelContext* context) const override;
};
template <typename T>
class SoftmaxCrossEntropyGrad final : public LossBase {
public:
explicit SoftmaxCrossEntropyGrad(const OpKernelInfo& info);
Status Compute(OpKernelContext* context) const override;
};
template <typename T>
class SparseSoftmaxCrossEntropy final : public LossBase {
public:
explicit SparseSoftmaxCrossEntropy(const OpKernelInfo& info);
Status Compute(OpKernelContext* context) const override;
};
template <typename T>
class SparseSoftmaxCrossEntropyGrad final : public LossBase {
public:
explicit SparseSoftmaxCrossEntropyGrad(const OpKernelInfo& info);
Status Compute(OpKernelContext* context) const override;
};
} // namespace onnxruntime::contrib
Import
#include "orttraining/training_ops/cpu/loss/cross_entropy.h"
I/O Contract
| Kernel | Inputs | Outputs | Description |
|---|---|---|---|
| SoftmaxCrossEntropy | logits (N,D), targets (N,D) | loss (scalar or N), log_prob (N,D) | Computes softmax CE loss with dense targets |
| SoftmaxCrossEntropyGrad | grad_output, log_prob, targets | grad_logits (N,D) | Gradient of softmax CE w.r.t. logits |
| SparseSoftmaxCrossEntropy | logits (N,D), labels (N) | loss (scalar or N), log_prob (N,D) | Computes softmax CE loss with sparse (index) targets |
| SparseSoftmaxCrossEntropyGrad | grad_output, log_prob, labels | grad_logits (N,D) | Gradient of sparse softmax CE w.r.t. logits |
| Attribute | Type | Values | Description |
|---|---|---|---|
| reduction | string | "mean", "sum", "none" | How to reduce the loss over the batch dimension |
Usage Examples
// These kernels are registered as contrib operators and invoked automatically.
// Example registration (in operator registration code):
// SoftmaxCrossEntropy is used for dense label targets:
// Input 0: logits [batch_size, num_classes]
// Input 1: targets [batch_size, num_classes] (probability distribution)
// Output 0: loss [1] (if reduction="mean" or "sum") or [batch_size]
// Output 1: log_prob [batch_size, num_classes]
// SparseSoftmaxCrossEntropy is used for integer class labels:
// Input 0: logits [batch_size, num_classes]
// Input 1: labels [batch_size] (integer class indices)
// Output 0: loss
// Output 1: log_prob [batch_size, num_classes]