Principle:Alibaba ROLL Teacher Forward Inference

Knowledge Sources	Knowledge Distillation Alibaba ROLL
Domains	Knowledge_Distillation, LLM_Inference
Last Updated	2026-02-07 20:00 GMT

Overview

An inference principle for extracting top-k softened probability distributions from a frozen teacher model for knowledge distillation.

Description

Teacher Forward Inference runs a forward pass through the frozen teacher model to extract the top-k logits (probabilities, log-probabilities, and indices) that will be transferred to the student. Only the top-k values are extracted to reduce communication bandwidth. Temperature scaling is applied to soften the distribution.

Usage

Use before each student training step in the distillation pipeline.

Theoretical Basis

Teacher produces softened probabilities: $p_{i}^{T} = \frac{\exp (z_{i} / T)}{\sum_{j} \exp (z_{j} / T)}$

Only top-k values are retained for efficiency.

Related Pages

Implemented By

Implementation:Alibaba_ROLL_TeacherWorker_Forward

Related Heuristics

The following heuristics inform this principle:

Heuristic:Alibaba_ROLL_Numerical_Stability_Epsilon

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment