Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CUDA ConvShared

From Leeroopedia


Knowledge Sources
Domains Training, CUDA_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool providing shared convolution utility functions for gradient operators in the ONNX Runtime CUDA training framework.

Description

Implements shared helper functions used by both ConvGrad and ConvTransposeGrad. Key utilities include: GetWorkspaceSize overloads that query cuDNN for workspace requirements of backward data, backward filter, and forward algorithms; GetMaxWorkspaceSize which determines the maximum feasible workspace given available GPU memory (assumes 10% fragmentation); FindAlgorithm which benchmarks multiple cuDNN algorithms in the given workspace and selects the best performer; AlgoIterator::TryAll which orchestrates algorithm selection and execution with fallback; PrepareConvForwardArgs and PrepareConvBackwardFilterArgs which set up cuDNN tensor descriptors, convolution descriptors, and algorithm search for conv transpose gradient computation. These utilities manage cuDNN descriptor creation with proper handling of groups, math type, and padding.

Usage

Called internally by ConvGrad and ConvTransposeGrad operators to prepare cuDNN arguments and select optimal algorithms for convolution gradient computation.

Code Reference

Source Location

Signature

cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_BwdDataAlgo algo, size_t* workspace_size);
cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_BwdFilterAlgo algo, size_t* workspace_size);
cudnnStatus_t GetWorkspaceSize(const ConvArgs& args, T_FwdAlgo algo, size_t* workspace_size);

template <typename T_Algo>
size_t GetMaxWorkspaceSize(const ConvArgs& args, const T_Algo* algo, int n_algo);

template <typename T_Perf>
Status FindAlgorithm(size_t max_workspace_size, const ConvArgs& args, T_Perf& algo_perf);

template <typename T_Perf>
Status AlgoIterator<T_Perf>::TryAll(...);

Status PrepareConvForwardArgs(const Tensor& dY, const Tensor& W, Tensor& dX,
                               cudnnHandle_t cudnn_handle, ConvArgs& args);
Status PrepareConvBackwardFilterArgs(const Tensor& dY, const Tensor& W, const Tensor& X,
                                      Tensor* dW, Tensor* dB,
                                      cudnnHandle_t cudnn_handle, ConvArgs& args);

Import

#include "orttraining/training_ops/cuda/nn/conv_shared.h"

I/O Contract

Inputs

Name Type Required Description
args ConvArgs& Yes Convolution arguments struct containing cuDNN handles, descriptors, and cached state
tensors Tensor& Yes Input/output/weight tensors for descriptor setup

Outputs

Name Type Description
args ConvArgs& Populated with cuDNN descriptors and selected algorithms
status Status OK on success

Usage Examples

// Used internally by ConvTransposeGrad
ConvArgs args_dx_;
ORT_RETURN_IF_ERROR(PrepareConvForwardArgs(*dY, *W, *dX, GetCudnnHandle(context), args_dx_));
ORT_RETURN_IF_ERROR(ComputeInputGradient(context->GetComputeStream(), args_dx_));

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment