Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CrossEntropy Declarations

From Leeroopedia


Knowledge Sources
Domains Training, Operators, Loss
Last Updated 2026-02-10 04:00 GMT

Overview

Declares CPU kernel classes for cross-entropy loss functions and their gradients used in ORT Training, including SoftmaxCrossEntropy and SparseSoftmaxCrossEntropy variants.

Description

The `cross_entropy.h` header declares the cross-entropy loss operator kernels for the ONNX Runtime training operators (in the `onnxruntime::contrib` namespace). These are CPU implementations used during training for computing loss and backpropagating gradients.

  • LossBase: Abstract base class extending `OpKernel`. Extracts the `reduction` attribute (mean, sum, or none) from the operator info and stores it as a `ReductionType` enum. All loss kernels inherit from this.
  • ComputeShareSoftmaxCrossEntropyCPU<T>: A free function template that computes the shared softmax and log-probability computation. Takes raw logit data, computes shifted logits (for numerical stability), and produces log probabilities. Parameters: `n` (batch size), `d` (class count), `nd` (total elements), and pre-allocated buffers for `shifted_logit` and `log_prob_data`.
  • SoftmaxCrossEntropy<T>: Computes softmax cross-entropy loss where both the predictions (logits) and targets are dense tensors. The `Compute` method applies softmax to logits and computes cross-entropy with the target distribution. Non-copyable, non-movable.
  • SoftmaxCrossEntropyGrad<T>: Computes the gradient of the softmax cross-entropy loss with respect to the logits. Non-copyable, non-movable.
  • SparseSoftmaxCrossEntropy<T>: Computes softmax cross-entropy loss where the targets are sparse (class indices rather than one-hot vectors). More memory-efficient for classification tasks with many classes. Non-copyable, non-movable.
  • SparseSoftmaxCrossEntropyGrad<T>: Computes the gradient of the sparse softmax cross-entropy loss. Non-copyable, non-movable.

All kernel classes are templated on the data type `T` (typically `float`) and implement the `Compute(OpKernelContext*)` method.

Usage

These kernels are registered as ORT contrib operators and are automatically invoked during training graph execution when the training graph contains SoftmaxCrossEntropy or SparseSoftmaxCrossEntropy nodes.

Code Reference

Source Location

Signature

namespace onnxruntime::contrib {

class LossBase : public OpKernel {
 public:
  explicit LossBase(const OpKernelInfo& info);
 protected:
  ReductionType reduction_;
};

template <typename T>
void ComputeShareSoftmaxCrossEntropyCPU(const int n, const int d,
    const Eigen::Index nd, const T* logit_data,
    T* shifted_logit, T* log_prob_data);

template <typename T>
class SoftmaxCrossEntropy final : public LossBase {
 public:
  explicit SoftmaxCrossEntropy(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;
};

template <typename T>
class SoftmaxCrossEntropyGrad final : public LossBase {
 public:
  explicit SoftmaxCrossEntropyGrad(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;
};

template <typename T>
class SparseSoftmaxCrossEntropy final : public LossBase {
 public:
  explicit SparseSoftmaxCrossEntropy(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;
};

template <typename T>
class SparseSoftmaxCrossEntropyGrad final : public LossBase {
 public:
  explicit SparseSoftmaxCrossEntropyGrad(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;
};

}  // namespace onnxruntime::contrib

Import

#include "orttraining/training_ops/cpu/loss/cross_entropy.h"

I/O Contract

Kernel Inputs Outputs Description
SoftmaxCrossEntropy logits (N,D), targets (N,D) loss (scalar or N), log_prob (N,D) Computes softmax CE loss with dense targets
SoftmaxCrossEntropyGrad grad_output, log_prob, targets grad_logits (N,D) Gradient of softmax CE w.r.t. logits
SparseSoftmaxCrossEntropy logits (N,D), labels (N) loss (scalar or N), log_prob (N,D) Computes softmax CE loss with sparse (index) targets
SparseSoftmaxCrossEntropyGrad grad_output, log_prob, labels grad_logits (N,D) Gradient of sparse softmax CE w.r.t. logits
Attribute Type Values Description
reduction string "mean", "sum", "none" How to reduce the loss over the batch dimension

Usage Examples

// These kernels are registered as contrib operators and invoked automatically.
// Example registration (in operator registration code):

// SoftmaxCrossEntropy is used for dense label targets:
//   Input 0: logits [batch_size, num_classes]
//   Input 1: targets [batch_size, num_classes]  (probability distribution)
//   Output 0: loss [1] (if reduction="mean" or "sum") or [batch_size]
//   Output 1: log_prob [batch_size, num_classes]

// SparseSoftmaxCrossEntropy is used for integer class labels:
//   Input 0: logits [batch_size, num_classes]
//   Input 1: labels [batch_size] (integer class indices)
//   Output 0: loss
//   Output 1: log_prob [batch_size, num_classes]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment