Principle:Alibaba ROLL Teacher Forward Inference
| Knowledge Sources | |
|---|---|
| Domains | Knowledge_Distillation, LLM_Inference |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
An inference principle for extracting top-k softened probability distributions from a frozen teacher model for knowledge distillation.
Description
Teacher Forward Inference runs a forward pass through the frozen teacher model to extract the top-k logits (probabilities, log-probabilities, and indices) that will be transferred to the student. Only the top-k values are extracted to reduce communication bandwidth. Temperature scaling is applied to soften the distribution.
Usage
Use before each student training step in the distillation pipeline.
Theoretical Basis
Teacher produces softened probabilities:
Only top-k values are retained for efficiency.
Related Pages
Implemented By
Related Heuristics
The following heuristics inform this principle: