Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL Logits Transfer Communication

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Knowledge_Distillation
Last Updated 2026-02-07 20:00 GMT

Overview

A distributed communication principle for efficiently transferring teacher model logits to student workers across different GPU clusters.

Description

Logits Transfer Communication solves the cross-cluster data transfer problem in distributed knowledge distillation. Teacher and student models run on separate GPU clusters, but the student needs the teacher's top-k logits for the distillation loss. Three backends are supported:

  • IPC+NCCL: Shared memory for same-node transfers, NCCL for cross-node
  • NCCL-only: Pure NCCL with circular offset to avoid same-GPU transfers
  • Ray: Ray-based object store transfers (simplest but slowest)

The transfer is organized in phases to avoid target conflicts when multiple source ranks send to the same target.

Usage

Use when teacher and student models are on separate GPU clusters and need to share logits.

Theoretical Basis

The communication plan maps teacher DP ranks to student DP ranks:

  • P2P transfers: Direct point-to-point for corresponding ranks
  • Broadcasts: TP/CP group broadcasts after P2P delivery

Related Pages

Implemented By

Related Heuristics

No specific heuristics inform this principle.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment