Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL LogitsTransferGroup

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Knowledge_Distillation
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete cross-cluster logits transfer group for knowledge distillation provided by the Alibaba ROLL library.

Description

The LogitsTransferGroup class manages the communication plan and execution for transferring teacher logits to student workers. It supports three backends (IPC+NCCL, NCCL-only, Ray), creates phased communication plans to avoid conflicts, and handles both P2P and broadcast transfers.

Usage

Created during distillation pipeline initialization with teacher and student clusters.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/distill/logits_transfer_group.py
  • Lines: L34-475

Signature

class LogitsTransferGroup:
    VALID_BACKENDS = {"ipc+nccl", "nccl-only", "ray"}

    def __init__(self, src_cluster, tgt_cluster, backend: str = "ipc+nccl") -> None:
        """Initialize with teacher (src) and student (tgt) clusters."""

    def make_comm_plan(self) -> None:
        """Create communication plan for logits transfer."""

    def make_collective_group(self) -> None:
        """Build collective groups for phased transfer."""

    def logits_transfer(self) -> dict:
        """Execute logits transfer for all tensor names. Returns timing metrics."""

    def logits_transfer_impl(self, tensor_name_for_transfer: str) -> dict:
        """Execute logits transfer in phase order."""

Import

from roll.pipeline.distill.logits_transfer_group import LogitsTransferGroup

I/O Contract

Inputs

Name Type Required Description
src_cluster Cluster Yes Teacher cluster (source of logits)
tgt_cluster Cluster Yes Student cluster (target for logits)
backend str Yes Transfer backend ("ipc+nccl", "nccl-only", "ray")

Outputs

Name Type Description
timing_metrics dict Transfer duration and communication statistics

Usage Examples

logits_group = LogitsTransferGroup(
    src_cluster=teacher_cluster,
    tgt_cluster=student_cluster,
    backend="ipc+nccl"
)
logits_group.make_comm_plan()
logits_group.make_collective_group()

# During training loop:
metrics = logits_group.logits_transfer()

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

No specific heuristics apply to this implementation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment