Implementation:Microsoft Onnxruntime CUDA SliceGrad

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for computing the gradient of Slice in the ONNX Runtime CUDA training framework.

Description

Implements the SliceGrad operator for CUDA that distributes upstream gradients back to the full input shape of the original Slice operation. The output gradient tensor is first zero-initialized, then the upstream gradient (from the sliced region) is scattered back to the corresponding positions using SliceImplGrad. The slice parameters (starts, ends, axes, steps) are read from CPU memory inputs. The gradient computation reverses the assignment direction of the standard Slice: instead of copying from input to output, it copies from the upstream gradient into the appropriate region of the zero-initialized output. The GetSlicedOrUnslicedTensor method creates the output tensor with the original data shape.

Usage

Invoked during the backward pass when the model uses Slice operations.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/tensor/slice_grad.cc
Lines: 1-69

Signature

class SliceGrad : public CudaKernel {
  const Tensor* GetSlicedOrUnslicedTensor(OpKernelContext* ctx) const;
  Status FillInputVectors(OpKernelContext* ctx, TensorShapeVector& input_starts,
                          TensorShapeVector& input_ends, TensorShapeVector& input_axes,
                          TensorShapeVector& input_steps) const;
  Status CallSliceImp(size_t element_size, size_t dimension_count,
                      const TArray<int64_t>& starts_buffer, const TArray<int64_t>& steps_buffer,
                      const TArray<int64_t>& input_strides, const TArray<fast_divmod>& output_strides,
                      OpKernelContext* ctx, const TensorShape& output_shape) const;
};

Import

#include "orttraining/training_ops/cuda/tensor/slice_grad.h"

I/O Contract

Inputs

Name	Type	Required	Description
dY	Tensor(T)	Yes	Upstream gradient (sliced region shape)
shape	Tensor(int64_t)	Yes	Original data shape (CPU memory)
starts	Tensor(Tind)	Yes	Slice start indices (CPU memory)
ends	Tensor(Tind)	Yes	Slice end indices (CPU memory)
axes	Tensor(Tind)	No	Axes to slice (CPU memory)
steps	Tensor(Tind)	No	Step sizes (CPU memory)

Outputs

Name	Type	Description
dX	Tensor(T)	Gradient with respect to full input (zero-initialized then sliced region filled)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(SliceGrad, kMSDomain, 1, kCudaExecutionProvider,
    (*KernelDefBuilder::Create())
        .InputMemoryType(OrtMemTypeCPUInput, 1)  // shape
        .InputMemoryType(OrtMemTypeCPUInput, 2)  // starts
        .InputMemoryType(OrtMemTypeCPUInput, 3)  // ends
        .InputMemoryType(OrtMemTypeCPUInput, 4)  // axes
        .InputMemoryType(OrtMemTypeCPUInput, 5)  // steps
        .TypeConstraint("T", DataTypeImpl::AllFixedSizeTensorTypes()),
    SliceGrad);

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment