Implementation:Microsoft Onnxruntime CUDA View
| Knowledge Sources | |
|---|---|
| Domains | Training, CUDA_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for creating multiple tensor views of a single contiguous buffer in the ONNX Runtime CUDA training framework.
Description
Implements the View operator for CUDA that splits a single input tensor into multiple output tensors by reinterpreting its memory layout. Up to 1024 output views are supported (view_count_limit). Each output shape is specified by a separate 1-D shape input tensor (CPU memory). The operator computes byte offsets for each view and validates that total output size matches the input buffer size. When the allocation planner aliases output memory to the input buffer, views are created as zero-copy offsets (SetByteOffset); when aliasing is not possible, device-to-device memcpy is used. All shape inputs are declared with CPU memory type, and all outputs alias input[0] via GenerateAliasMapping. Supports all fixed-size tensor types.
Usage
Used during training to partition weight or activation buffers into multiple logical tensors without copying, enabling efficient memory reuse in optimizer and model parallelism scenarios.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cuda/tensor/view.cc
- Lines: 1-90
Signature
class View : public CudaKernel {
Status ComputeInternal(OpKernelContext* context) const;
};
Import
#include "orttraining/training_ops/cuda/tensor/view.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | Tensor(T) | Yes | Input tensor (contiguous buffer) |
| shapes | Tensor(int64_t)... | Yes | 1-D shape tensors for each view (CPU memory, up to 1024) |
Outputs
| Name | Type | Description |
|---|---|---|
| views | Tensor(T)... | Output view tensors (aliased to input buffer, up to 1024) |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(View, kMSDomain, 1, kCudaExecutionProvider,
(*KernelDefBuilder::Create())
.TypeConstraint("T", DataTypeImpl::AllFixedSizeTensorTypes())
.TypeConstraint("shapes", DataTypeImpl::GetTensorType<int64_t>())
.InputMemoryType(OrtMemTypeCPUInput, GenerateInputMemoryType())
.Alias(GenerateAliasMapping()),
View);