Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime CPU MpiRecv

From Leeroopedia


Knowledge Sources
Domains Training, CPU_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for receiving tensors via MPI during distributed training on CPU in the ONNX Runtime training framework.

Description

This file implements the Recv kernel, which receives tensor data from a remote MPI process. The kernel supports two modes: when output shapes can be inferred statically, tensors are allocated before communication; otherwise, shape information is received first from the remote process via a separate MPI message. The kernel aggregates all received tensors into a single contiguous buffer with alignment padding (via GetAggregatedAlignedAddress), receives the buffer in a single MPI_Recv call, and then copies each tensor's data to its respective output. A boolean control signal input must be true for the kernel to execute, and a boolean control signal output is set to true upon completion. The kernel enforces that same-rank communication is not allowed.

Usage

This kernel is used in distributed training pipeline parallelism and model parallelism scenarios where tensors need to be transferred between different processes via MPI.

Code Reference

Source Location

Signature

void Recv::ReceiveData(const int num_tensors,
    std::vector<Tensor*> received_tensors,
    const int src,
    const size_t aggregated_aligned_tensor_bytes,
    std::vector<char>& buffer) const;

Status Recv::Compute(OpKernelContext* ctx) const;

Import

#include "orttraining/orttraining/training_ops/cpu/communication/recv.h"

I/O Contract

Inputs

Name Type Required Description
input_signal Tensor(bool) Yes Control signal (must be true)
remote_rank Tensor(int64) Yes MPI rank of the sender process

Outputs

Name Type Description
output_signal Tensor(bool) Completion signal (set to true)
received_tensors (variadic) Tensor(V) One or more received tensors

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    Recv, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .InputMemoryType(OrtMemTypeDefault, 0)
        .InputMemoryType(OrtMemTypeDefault, 1)
        .OutputMemoryType(OrtMemTypeDefault, 0)
        .TypeConstraint("TBool", DataTypeImpl::GetTensorType<bool>())
        .TypeConstraint("TInt64", DataTypeImpl::GetTensorType<int64_t>())
        .TypeConstraint("V", DataTypeImpl::AllFixedSizeTensorTypes()),
    Recv);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment