Implementation:Alibaba ROLL LogitsTransferGroup
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Knowledge_Distillation |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete cross-cluster logits transfer group for knowledge distillation provided by the Alibaba ROLL library.
Description
The LogitsTransferGroup class manages the communication plan and execution for transferring teacher logits to student workers. It supports three backends (IPC+NCCL, NCCL-only, Ray), creates phased communication plans to avoid conflicts, and handles both P2P and broadcast transfers.
Usage
Created during distillation pipeline initialization with teacher and student clusters.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/distill/logits_transfer_group.py
- Lines: L34-475
Signature
class LogitsTransferGroup:
VALID_BACKENDS = {"ipc+nccl", "nccl-only", "ray"}
def __init__(self, src_cluster, tgt_cluster, backend: str = "ipc+nccl") -> None:
"""Initialize with teacher (src) and student (tgt) clusters."""
def make_comm_plan(self) -> None:
"""Create communication plan for logits transfer."""
def make_collective_group(self) -> None:
"""Build collective groups for phased transfer."""
def logits_transfer(self) -> dict:
"""Execute logits transfer for all tensor names. Returns timing metrics."""
def logits_transfer_impl(self, tensor_name_for_transfer: str) -> dict:
"""Execute logits transfer in phase order."""
Import
from roll.pipeline.distill.logits_transfer_group import LogitsTransferGroup
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| src_cluster | Cluster | Yes | Teacher cluster (source of logits) |
| tgt_cluster | Cluster | Yes | Student cluster (target for logits) |
| backend | str | Yes | Transfer backend ("ipc+nccl", "nccl-only", "ray") |
Outputs
| Name | Type | Description |
|---|---|---|
| timing_metrics | dict | Transfer duration and communication statistics |
Usage Examples
logits_group = LogitsTransferGroup(
src_cluster=teacher_cluster,
tgt_cluster=student_cluster,
backend="ipc+nccl"
)
logits_group.make_comm_plan()
logits_group.make_collective_group()
# During training loop:
metrics = logits_group.logits_transfer()
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
No specific heuristics apply to this implementation.