Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL Teacher Forward Inference

From Leeroopedia


Knowledge Sources
Domains Knowledge_Distillation, LLM_Inference
Last Updated 2026-02-07 20:00 GMT

Overview

An inference principle for extracting top-k softened probability distributions from a frozen teacher model for knowledge distillation.

Description

Teacher Forward Inference runs a forward pass through the frozen teacher model to extract the top-k logits (probabilities, log-probabilities, and indices) that will be transferred to the student. Only the top-k values are extracted to reduce communication bandwidth. Temperature scaling is applied to soften the distribution.

Usage

Use before each student training step in the distillation pipeline.

Theoretical Basis

Teacher produces softened probabilities: piT=exp(zi/T)jexp(zj/T)

Only top-k values are retained for efficiency.

Related Pages

Implemented By

Related Heuristics

The following heuristics inform this principle:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment