Implementation:Microsoft Onnxruntime CPU MpiRecv
| Knowledge Sources | |
|---|---|
| Domains | Training, CPU_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for receiving tensors via MPI during distributed training on CPU in the ONNX Runtime training framework.
Description
This file implements the Recv kernel, which receives tensor data from a remote MPI process. The kernel supports two modes: when output shapes can be inferred statically, tensors are allocated before communication; otherwise, shape information is received first from the remote process via a separate MPI message. The kernel aggregates all received tensors into a single contiguous buffer with alignment padding (via GetAggregatedAlignedAddress), receives the buffer in a single MPI_Recv call, and then copies each tensor's data to its respective output. A boolean control signal input must be true for the kernel to execute, and a boolean control signal output is set to true upon completion. The kernel enforces that same-rank communication is not allowed.
Usage
This kernel is used in distributed training pipeline parallelism and model parallelism scenarios where tensors need to be transferred between different processes via MPI.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/communication/recv.cc
- Lines: 1-173
Signature
void Recv::ReceiveData(const int num_tensors,
std::vector<Tensor*> received_tensors,
const int src,
const size_t aggregated_aligned_tensor_bytes,
std::vector<char>& buffer) const;
Status Recv::Compute(OpKernelContext* ctx) const;
Import
#include "orttraining/orttraining/training_ops/cpu/communication/recv.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_signal | Tensor(bool) | Yes | Control signal (must be true) |
| remote_rank | Tensor(int64) | Yes | MPI rank of the sender process |
Outputs
| Name | Type | Description |
|---|---|---|
| output_signal | Tensor(bool) | Completion signal (set to true) |
| received_tensors (variadic) | Tensor(V) | One or more received tensors |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
Recv, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder()
.InputMemoryType(OrtMemTypeDefault, 0)
.InputMemoryType(OrtMemTypeDefault, 1)
.OutputMemoryType(OrtMemTypeDefault, 0)
.TypeConstraint("TBool", DataTypeImpl::GetTensorType<bool>())
.TypeConstraint("TInt64", DataTypeImpl::GetTensorType<int64_t>())
.TypeConstraint("V", DataTypeImpl::AllFixedSizeTensorTypes()),
Recv);