Implementation:Microsoft Onnxruntime CUDA View

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CUDA_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for creating multiple tensor views of a single contiguous buffer in the ONNX Runtime CUDA training framework.

Description

Implements the View operator for CUDA that splits a single input tensor into multiple output tensors by reinterpreting its memory layout. Up to 1024 output views are supported (view_count_limit). Each output shape is specified by a separate 1-D shape input tensor (CPU memory). The operator computes byte offsets for each view and validates that total output size matches the input buffer size. When the allocation planner aliases output memory to the input buffer, views are created as zero-copy offsets (SetByteOffset); when aliasing is not possible, device-to-device memcpy is used. All shape inputs are declared with CPU memory type, and all outputs alias input[0] via GenerateAliasMapping. Supports all fixed-size tensor types.

Usage

Used during training to partition weight or activation buffers into multiple logical tensors without copying, enabling efficient memory reuse in optimizer and model parallelism scenarios.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cuda/tensor/view.cc
Lines: 1-90

Signature

class View : public CudaKernel {
  Status ComputeInternal(OpKernelContext* context) const;
};

Import

#include "orttraining/training_ops/cuda/tensor/view.h"

I/O Contract

Inputs

Name	Type	Required	Description
input	Tensor(T)	Yes	Input tensor (contiguous buffer)
shapes	Tensor(int64_t)...	Yes	1-D shape tensors for each view (CPU memory, up to 1024)

Outputs

Name	Type	Description
views	Tensor(T)...	Output view tensors (aliased to input buffer, up to 1024)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(View, kMSDomain, 1, kCudaExecutionProvider,
    (*KernelDefBuilder::Create())
        .TypeConstraint("T", DataTypeImpl::AllFixedSizeTensorTypes())
        .TypeConstraint("shapes", DataTypeImpl::GetTensorType<int64_t>())
        .InputMemoryType(OrtMemTypeCPUInput, GenerateInputMemoryType())
        .Alias(GenerateAliasMapping()),
    View);

Related Pages

Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment