Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CUDA View

From Leeroopedia


Knowledge Sources
Domains Training, CUDA_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for creating multiple tensor views of a single contiguous buffer in the ONNX Runtime CUDA training framework.

Description

Implements the View operator for CUDA that splits a single input tensor into multiple output tensors by reinterpreting its memory layout. Up to 1024 output views are supported (view_count_limit). Each output shape is specified by a separate 1-D shape input tensor (CPU memory). The operator computes byte offsets for each view and validates that total output size matches the input buffer size. When the allocation planner aliases output memory to the input buffer, views are created as zero-copy offsets (SetByteOffset); when aliasing is not possible, device-to-device memcpy is used. All shape inputs are declared with CPU memory type, and all outputs alias input[0] via GenerateAliasMapping. Supports all fixed-size tensor types.

Usage

Used during training to partition weight or activation buffers into multiple logical tensors without copying, enabling efficient memory reuse in optimizer and model parallelism scenarios.

Code Reference

Source Location

Signature

class View : public CudaKernel {
  Status ComputeInternal(OpKernelContext* context) const;
};

Import

#include "orttraining/training_ops/cuda/tensor/view.h"

I/O Contract

Inputs

Name Type Required Description
input Tensor(T) Yes Input tensor (contiguous buffer)
shapes Tensor(int64_t)... Yes 1-D shape tensors for each view (CPU memory, up to 1024)

Outputs

Name Type Description
views Tensor(T)... Output view tensors (aliased to input buffer, up to 1024)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(View, kMSDomain, 1, kCudaExecutionProvider,
    (*KernelDefBuilder::Create())
        .TypeConstraint("T", DataTypeImpl::AllFixedSizeTensorTypes())
        .TypeConstraint("shapes", DataTypeImpl::GetTensorType<int64_t>())
        .InputMemoryType(OrtMemTypeCPUInput, GenerateInputMemoryType())
        .Alias(GenerateAliasMapping()),
    View);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment