Implementation:Microsoft Onnxruntime CPU MpiRecv

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CPU_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for receiving tensors via MPI during distributed training on CPU in the ONNX Runtime training framework.

Description

This file implements the Recv kernel, which receives tensor data from a remote MPI process. The kernel supports two modes: when output shapes can be inferred statically, tensors are allocated before communication; otherwise, shape information is received first from the remote process via a separate MPI message. The kernel aggregates all received tensors into a single contiguous buffer with alignment padding (via GetAggregatedAlignedAddress), receives the buffer in a single MPI_Recv call, and then copies each tensor's data to its respective output. A boolean control signal input must be true for the kernel to execute, and a boolean control signal output is set to true upon completion. The kernel enforces that same-rank communication is not allowed.

Usage

This kernel is used in distributed training pipeline parallelism and model parallelism scenarios where tensors need to be transferred between different processes via MPI.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cpu/communication/recv.cc
Lines: 1-173

Signature

void Recv::ReceiveData(const int num_tensors,
    std::vector<Tensor*> received_tensors,
    const int src,
    const size_t aggregated_aligned_tensor_bytes,
    std::vector<char>& buffer) const;

Status Recv::Compute(OpKernelContext* ctx) const;

Import

#include "orttraining/orttraining/training_ops/cpu/communication/recv.h"

I/O Contract

Inputs

Name	Type	Required	Description
input_signal	Tensor(bool)	Yes	Control signal (must be true)
remote_rank	Tensor(int64)	Yes	MPI rank of the sender process

Outputs

Name	Type	Description
output_signal	Tensor(bool)	Completion signal (set to true)
received_tensors (variadic)	Tensor(V)	One or more received tensors

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    Recv, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .InputMemoryType(OrtMemTypeDefault, 0)
        .InputMemoryType(OrtMemTypeDefault, 1)
        .OutputMemoryType(OrtMemTypeDefault, 0)
        .TypeConstraint("TBool", DataTypeImpl::GetTensorType<bool>())
        .TypeConstraint("TInt64", DataTypeImpl::GetTensorType<int64_t>())
        .TypeConstraint("V", DataTypeImpl::AllFixedSizeTensorTypes()),
    Recv);

Related Pages

Environment:Microsoft_Onnxruntime_CPU_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment