Implementation:Microsoft Onnxruntime CUDA ClipGradNorm
| Knowledge Sources | |
|---|---|
| Domains | Training, CUDA_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for in-place gradient norm clipping in the ONNX Runtime CUDA training framework.
Description
Implements the InplaceClipGradNorm operator for CUDA that clips gradients by their global L2 norm. The implementation first computes the Frobenius (L2) norm of all gradient tensors using MultiTensorReduceL2 via the multi-tensor functor framework, followed by a scalar square root. Then it applies gradient clipping using ClipGradNormFunctor which scales gradients by max_norm / max(total_norm, epsilon). The operation processes all gradient tensors as a sequence in batched fashion with a chunk size of 2048*32 elements. The output sequence reuses the input buffer when the allocation planner allows in-place operation (via Alias(0, 0)), otherwise it performs a deep copy. A small epsilon (1e-6) prevents division by zero.
Usage
Invoked before the optimizer step to clip gradients when gradient explosion is a concern, commonly used in RNN and transformer training.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.cc
- Lines: 1-107
Signature
class InplaceClipGradNorm : public CudaKernel {
Status ComputeInternal(OpKernelContext* ctx) const;
};
Import
#include "orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| gradients | TensorSeq(S_GRAD) | Yes | Sequence of gradient tensors to clip |
Outputs
| Name | Type | Description |
|---|---|---|
| clipped_gradients | TensorSeq(S_GRAD) | Gradient tensors with clipped norms (may be in-place) |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
InplaceClipGradNorm, kMSDomain, 1, kCudaExecutionProvider,
(*KernelDefBuilder::Create())
.Alias(0, 0) // Return updated gradients in-place
.TypeConstraint("S_GRAD", DataTypeImpl::AllFixedSizeSequenceTensorTypes()),
InplaceClipGradNorm);