Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CUDA ClipGradNorm

From Leeroopedia
Revision as of 15:45, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_Onnxruntime_CUDA_ClipGradNorm.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Training, CUDA_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for in-place gradient norm clipping in the ONNX Runtime CUDA training framework.

Description

Implements the InplaceClipGradNorm operator for CUDA that clips gradients by their global L2 norm. The implementation first computes the Frobenius (L2) norm of all gradient tensors using MultiTensorReduceL2 via the multi-tensor functor framework, followed by a scalar square root. Then it applies gradient clipping using ClipGradNormFunctor which scales gradients by max_norm / max(total_norm, epsilon). The operation processes all gradient tensors as a sequence in batched fashion with a chunk size of 2048*32 elements. The output sequence reuses the input buffer when the allocation planner allows in-place operation (via Alias(0, 0)), otherwise it performs a deep copy. A small epsilon (1e-6) prevents division by zero.

Usage

Invoked before the optimizer step to clip gradients when gradient explosion is a concern, commonly used in RNN and transformer training.

Code Reference

Source Location

Signature

class InplaceClipGradNorm : public CudaKernel {
  Status ComputeInternal(OpKernelContext* ctx) const;
};

Import

#include "orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.h"

I/O Contract

Inputs

Name Type Required Description
gradients TensorSeq(S_GRAD) Yes Sequence of gradient tensors to clip

Outputs

Name Type Description
clipped_gradients TensorSeq(S_GRAD) Gradient tensors with clipped norms (may be in-place)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    InplaceClipGradNorm, kMSDomain, 1, kCudaExecutionProvider,
    (*KernelDefBuilder::Create())
        .Alias(0, 0)  // Return updated gradients in-place
        .TypeConstraint("S_GRAD", DataTypeImpl::AllFixedSizeSequenceTensorTypes()),
    InplaceClipGradNorm);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment