Implementation:Microsoft Onnxruntime CUDA ConvShared

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool providing shared convolution utility functions for gradient operators in the ONNX Runtime CUDA training framework.

Description

Implements shared helper functions used by both ConvGrad and ConvTransposeGrad. Key utilities include: GetWorkspaceSize overloads that query cuDNN for workspace requirements of backward data, backward filter, and forward algorithms; GetMaxWorkspaceSize which determines the maximum feasible workspace given available GPU memory (assumes 10% fragmentation); FindAlgorithm which benchmarks multiple cuDNN algorithms in the given workspace and selects the best performer; AlgoIterator::TryAll which orchestrates algorithm selection and execution with fallback; PrepareConvForwardArgs and PrepareConvBackwardFilterArgs which set up cuDNN tensor descriptors, convolution descriptors, and algorithm search for conv transpose gradient computation. These utilities manage cuDNN descriptor creation with proper handling of groups, math type, and padding.

Usage

Called internally by ConvGrad and ConvTransposeGrad operators to prepare cuDNN arguments and select optimal algorithms for convolution gradient computation.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/nn/conv_shared.cc
Lines: 1-282

Signature

cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_BwdDataAlgo algo, size_t* workspace_size);
cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_BwdFilterAlgo algo, size_t* workspace_size);
cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_FwdAlgo algo, size_t* workspace_size);

template <typename T_Algo>
size_t GetMaxWorkspaceSize(const ConvArgs& args, const T_Algo* algo, int n_algo);

template <typename T_Perf>
Status FindAlgorithm(size_t max_workspace_size, const ConvArgs& args, T_Perf& algo_perf);

template <typename T_Perf>
Status AlgoIterator<T_Perf>::TryAll(...);

Status PrepareConvForwardArgs(const Tensor& dY, const Tensor& W, Tensor& dX,
                               cudnnHandle_t cudnn_handle, ConvArgs& args);
Status PrepareConvBackwardFilterArgs(const Tensor& dY, const Tensor& W, const Tensor& X,
                                      Tensor* dW, Tensor* dB,
                                      cudnnHandle_t cudnn_handle, ConvArgs& args);

Import

#include "orttraining/training_ops/cuda/nn/conv_shared.h"

I/O Contract

Inputs

Name	Type	Required	Description
args	ConvArgs&	Yes	Convolution arguments struct containing cuDNN handles, descriptors, and cached state
tensors	Tensor&	Yes	Input/output/weight tensors for descriptor setup

Outputs

Name	Type	Description
args	ConvArgs&	Populated with cuDNN descriptors and selected algorithms
status	Status	OK on success

Usage Examples

// Used internally by ConvTransposeGrad
ConvArgs args_dx_;
ORT_RETURN_IF_ERROR(PrepareConvForwardArgs(*dY, *W, *dX, GetCudnnHandle(context), args_dx_));
ORT_RETURN_IF_ERROR(ComputeInputGradient(context->GetComputeStream(), args_dx_));

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment