Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Alibaba ROLL Teacher Forward Inference

From Leeroopedia


Knowledge Sources
Domains Knowledge_Distillation, LLM_Inference
Last Updated 2026-02-07 20:00 GMT

Overview

An inference principle for extracting top-k softened probability distributions from a frozen teacher model for knowledge distillation.

Description

Teacher Forward Inference runs a forward pass through the frozen teacher model to extract the top-k logits (probabilities, log-probabilities, and indices) that will be transferred to the student. Only the top-k values are extracted to reduce communication bandwidth. Temperature scaling is applied to soften the distribution.

Usage

Use before each student training step in the distillation pipeline.

Theoretical Basis

Teacher produces softened probabilities: piT=exp(zi/T)jexp(zj/T)

Only top-k values are retained for efficiency.

Related Pages

Implemented By

Related Heuristics

The following heuristics inform this principle:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment