Implementation:Alibaba ROLL TeacherWorker Forward
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Knowledge_Distillation, LLM_Inference |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete teacher forward pass with top-k logit extraction provided by the Alibaba ROLL library.
Description
The TeacherWorker.forward method runs inference through the teacher model and extracts top-k probabilities, log-probabilities, and indices. The results are cached in the student's LogitsCache via the LogitsTransferGroup.
Usage
Called before each student training step.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/distill/distill_worker.py
- Lines: L477-584
Signature
class TeacherWorker(Worker):
@register(dispatch_mode=Dispatch.DP_MP_DISPATCH_FIRST_COLLECT_ALL, clear_cache=False)
def forward(self, data: DataProto) -> DataProto:
"""
Teacher forward pass with top-k logit extraction.
Args:
data: DataProto with input_ids, attention_mask, labels
Returns:
DataProto (logits cached in student via LogitsTransferGroup)
"""
def logits_transfer(self, tensor_name_for_transfer, model_update_name,
broadcast_comm_plan_args, p2p_tgt_workers, p2p_entry_list, backend):
"""Transfer teacher logits to student workers."""
Import
from roll.pipeline.distill.distill_worker import TeacherWorker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | DataProto | Yes | Batch with input_ids, attention_mask, labels |
Outputs
| Name | Type | Description |
|---|---|---|
| topk_probs | torch.Tensor | Top-k teacher probabilities |
| topk_log_probs | torch.Tensor | Top-k teacher log probabilities |
| topk_indices | torch.Tensor | Top-k vocabulary indices |
| topk_inf_mask | torch.Tensor | Mask for infinite values |
Usage Examples
# Called by the distillation pipeline:
teacher_cluster.execute_all_sync("forward", batch)
# Logits are automatically transferred to student's LogitsCache
logits_transfer_group.logits_transfer()
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
This implementation uses the following heuristics:
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment