Implementation:Microsoft Onnxruntime CUDA TrainingKernels

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for registering all CUDA training operator kernels in the ONNX Runtime CUDA training framework.

Description

This file serves as the central kernel registry for all CUDA-based training operators. It declares and registers over 200 operator kernel classes across categories including optimizers (Adam, AdamW, LAMB, SGD), loss functions (SoftmaxCrossEntropy, SoftmaxCrossEntropyLoss), gradient operators (ConvGrad, BatchNormalizationGrad, DivGrad, GatherGrad, SoftmaxGrad), collective communications (NcclAllReduce, NcclAllGather, NcclReduceScatter), GIST compression encoders/decoders, gradient control (InPlaceAccumulator, ZeroGradient), mixed precision scaling, and tensor operations (SplitTraining, ConcatTraining, View). The function RegisterCudaTrainingKernels builds a KernelRegistry and inserts all declared kernel creation info entries.

Usage

This registry is invoked during ONNX Runtime session initialization when the CUDA execution provider is configured for training. It is the entry point that makes all CUDA training operators available to the execution engine.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/cuda_training_kernels.cc
Lines: 1-541

Signature

namespace onnxruntime {
namespace cuda {

// Forward declarations of all training kernel classes (200+ entries)
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCudaExecutionProvider, kMSDomain, 1, View);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCudaExecutionProvider, kMSDomain, 1, Group);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCudaExecutionProvider, kMSDomain, 1, PassThrough);
// ... (many more kernel class declarations)

Status RegisterCudaTrainingKernels(KernelRegistry& kernel_registry);

}  // namespace cuda
}  // namespace onnxruntime

Import

#include "core/providers/shared_library/provider_api.h"
#include "core/providers/cuda/cuda_fwd.h"
#include "core/providers/cuda/cuda_pch.h"

I/O Contract

Inputs

Name	Type	Required	Description
kernel_registry	KernelRegistry&	Yes	Reference to the kernel registry to populate with training kernel definitions

Outputs

Name	Type	Description
status	Status	Returns OK if all kernels registered successfully

Usage Examples

// During CUDA execution provider initialization for training
KernelRegistry kernel_registry;
Status status = RegisterCudaTrainingKernels(kernel_registry);
ORT_ENFORCE(status.IsOK(), "Failed to register CUDA training kernels.");

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment