Principle:OpenRLHF OpenRLHF Knowledge Distillation Training

Knowledge Sources	Distilling the Knowledge in a Neural Network MiniLLM: Knowledge Distillation of Large Language Models
Domains	NLP, Training, Model_Compression
Last Updated	2026-02-07 00:00 GMT

Overview

A training technique that transfers knowledge from a large teacher model to a smaller student model by matching token-level probability distributions.

Description

Knowledge Distillation (KD) trains a student model to mimic a teacher model's output distribution. The student's loss combines a standard language modeling loss (cross-entropy with ground truth labels) and a distillation loss (KL divergence between teacher and student distributions). This allows the student to learn from both the explicit labels and the teacher's "dark knowledge" encoded in its soft probability distributions.

Usage

Use when you need to compress a large teacher model into a smaller student model while retaining as much capability as possible. The teacher model is frozen and used only for inference.

Theoretical Basis

The combined loss for knowledge distillation: $L = (1 - α) \cdot L_{C E} (π_{S}, y) + α \cdot L_{K D} (π_{T}, π_{S})$

where:

$L_{C E}$ is the standard cross-entropy loss with ground truth
$L_{K D} = - \sum_{v} P_{T} (v) \log P_{S} (v)$ is the forward KL divergence
$α$ (kd_coef) controls the balance between CE and KD losses

Related Pages

Implemented By

Implementation:OpenRLHF_OpenRLHF_KDTrainer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment