Implementation:Microsoft Onnxruntime CPU ClipGradNorm
| Knowledge Sources | |
|---|---|
| Domains | Training, CPU_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for gradient norm clipping on CPU in the ONNX Runtime training framework.
Description
This file implements the InplaceClipGradNorm kernel, which clips gradients by their global L2 norm. It first computes the total L2 norm across all gradient tensors in a TensorSeq using GetL2Norm (sum of squared elements, then square root). Then it computes a clip coefficient: min(max_norm / (total_norm + epsilon), 1.0). If the total norm exceeds max_norm, all gradients are scaled down by this coefficient; otherwise they remain unchanged. The operation is performed in-place on the gradient sequence. If the output sequence differs from the input, the clipped gradients are copied to the output.
Usage
This kernel is invoked during training to prevent gradient explosion by clipping the global gradient norm before the optimizer step. It is commonly used in conjunction with AdamW or SGD optimizers.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/optimizer/clip_grad_norm/clip_grad_norm.cc
- Lines: 1-91
Signature
template <typename T>
T GetL2Norm(const TensorSeq& gradients);
template <typename T>
void ClipGradNorm(T total_norm, T max_norm, TensorSeq& gradients);
Status PopulateOutput(OpKernelContext* ctx, const TensorSeq* gradients,
TensorSeq* clipped_gradients);
template <typename T>
Status InplaceClipGradNorm<T>::Compute(OpKernelContext* ctx) const;
Import
#include "orttraining/orttraining/training_ops/cpu/optimizer/clip_grad_norm/clip_grad_norm.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| gradients | TensorSeq(float) | Yes | Sequence of gradient tensors to clip |
Outputs
| Name | Type | Description |
|---|---|---|
| clipped_gradients | TensorSeq(float) | Clipped gradient tensors (in-place alias) |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
InplaceClipGradNorm, kMSDomain, 1, kCpuExecutionProvider,
(*KernelDefBuilder::Create())
.Alias(0, 0) /* Return updated gradients in-place */
.TypeConstraint("S_GRAD", DataTypeImpl::AllFixedSizeSequenceTensorTypes()),
InplaceClipGradNorm<float>);