Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime CPU LayerNormGrad

From Leeroopedia


Knowledge Sources
Domains Training, CPU_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for computing layer normalization gradients on CPU in the ONNX Runtime training framework.

Description

This file implements three layer normalization gradient kernels: LayerNormGrad (standard), SimplifiedLayerNormalizationGrad (simplified, without bias), and InvertibleLayerNormGrad (which recovers X from Y, scale, and bias without needing the original input). All are registered under kMSDomain opset 1 for float and double types.

The standard LayerNormGrad computes gradients using intermediate arrays A, B, and C: A = dY * (X - mean) * inv_std_var, B = dY * scale * inv_std_var, C = B * (X - mean) * inv_std_var. The input gradient is dX = B - mean(B) - (X - mean) * inv_std_var * mean(C). Scale gradient is d_scale = sum(A) and bias gradient is d_bias = sum(dY). The simplified variant omits the mean subtraction and bias gradient. The invertible variant recovers X from the forward output Y using (Y - bias) / scale.

Usage

These kernels are invoked during the backward pass of layer normalization operations. They are commonly used in transformer architectures for training.

Code Reference

Source Location

Signature

template <typename T, bool simplified>
LayerNormGrad<T, simplified>::LayerNormGrad(const OpKernelInfo& op_kernel_info);

template <typename T, bool simplified>
Status LayerNormGrad<T, simplified>::Compute(OpKernelContext* op_kernel_context) const;

template <typename T>
InvertibleLayerNormGrad<T>::InvertibleLayerNormGrad(const OpKernelInfo& op_kernel_info);

template <typename T>
Status InvertibleLayerNormGrad<T>::Compute(OpKernelContext* op_kernel_context) const;

Import

#include "orttraining/orttraining/training_ops/cpu/nn/layer_norm.h"

I/O Contract

Inputs (LayerNormGrad)

Name Type Required Description
Y_grad Tensor(T) Yes Upstream gradient [N, M]
X Tensor(T) Yes Input tensor from forward [N, M]
scale Tensor(T) Yes Scale parameter [M]
mean Tensor(float) Yes (std) / No (simplified) Saved mean [N]
inv_std_var Tensor(float) Yes Saved inverse standard deviation [N]

Outputs (LayerNormGrad)

Name Type Description
X_grad Tensor(T) Gradient w.r.t. input X
scale_grad Tensor(T) Gradient w.r.t. scale
bias_grad Tensor(T) Gradient w.r.t. bias (not produced in simplified mode)

Inputs (InvertibleLayerNormGrad)

Name Type Required Description
Y_grad Tensor(T) Yes Upstream gradient
Y Tensor(T) Yes Output from forward pass
scale Tensor(T) Yes Scale parameter
bias Tensor(T) Yes Bias parameter
inv_std_var Tensor(float) Yes Saved inverse standard deviation

Outputs (InvertibleLayerNormGrad)

Name Type Description
X_grad Tensor(T) Gradient w.r.t. input X
scale_grad Tensor(T) Gradient w.r.t. scale
bias_grad Tensor(T) Gradient w.r.t. bias

Usage Examples

ONNX_OPERATOR_TYPED_KERNEL_EX(
    LayerNormalizationGrad, kMSDomain, 1, float, kCpuExecutionProvider,
    KernelDefBuilder()
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("U", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("V", DataTypeImpl::GetTensorType<float>()),
    LayerNormGrad<float, false>);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment