Principle:FlagOpen FlagEmbedding Knowledge Distillation Scoring
Overview
A technique that uses a teacher reranker model to generate soft relevance scores for query-passage pairs, enabling knowledge distillation during embedding model fine-tuning.
Description
Knowledge distillation transfers the knowledge of a stronger teacher model (reranker) to a student model (embedder). The add_reranker_score.py script takes training data with query/pos/neg triplets, scores all pairs with a reranker, and writes pos_scores and neg_scores back to the JSONL data. During training, the embedder uses KL divergence loss between its similarity distribution and the teacher's score distribution instead of hard labels.
Usage
After hard negative mining and before training, when you want to improve embedder quality using a teacher reranker.
Theoretical Basis
Knowledge distillation minimizes KL(P_teacher || P_student). The teacher's cross-encoder scores provide a richer signal than binary labels. The student learns to match the teacher's relative ranking, not just positive vs negative.
Related Pages
Implementation:FlagOpen_FlagEmbedding_Add_Reranker_Score_Script