Principle:Alibaba ROLL Logits Transfer Communication
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Knowledge_Distillation |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A distributed communication principle for efficiently transferring teacher model logits to student workers across different GPU clusters.
Description
Logits Transfer Communication solves the cross-cluster data transfer problem in distributed knowledge distillation. Teacher and student models run on separate GPU clusters, but the student needs the teacher's top-k logits for the distillation loss. Three backends are supported:
- IPC+NCCL: Shared memory for same-node transfers, NCCL for cross-node
- NCCL-only: Pure NCCL with circular offset to avoid same-GPU transfers
- Ray: Ray-based object store transfers (simplest but slowest)
The transfer is organized in phases to avoid target conflicts when multiple source ranks send to the same target.
Usage
Use when teacher and student models are on separate GPU clusters and need to share logits.
Theoretical Basis
The communication plan maps teacher DP ranks to student DP ranks:
- P2P transfers: Direct point-to-point for corresponding ranks
- Broadcasts: TP/CP group broadcasts after P2P delivery
Related Pages
Implemented By
Related Heuristics
No specific heuristics inform this principle.