Principle:FlagOpen FlagEmbedding Knowledge Distillation Scoring

Overview

A technique that uses a teacher reranker model to generate soft relevance scores for query-passage pairs, enabling knowledge distillation during embedding model fine-tuning.

Description

Knowledge distillation transfers the knowledge of a stronger teacher model (reranker) to a student model (embedder). The add_reranker_score.py script takes training data with query/pos/neg triplets, scores all pairs with a reranker, and writes pos_scores and neg_scores back to the JSONL data. During training, the embedder uses KL divergence loss between its similarity distribution and the teacher's score distribution instead of hard labels.

Usage

After hard negative mining and before training, when you want to improve embedder quality using a teacher reranker.

Theoretical Basis

Knowledge distillation minimizes KL(P_teacher || P_student). The teacher's cross-encoder scores provide a richer signal than binary labels. The student learns to match the teacher's relative ranking, not just positive vs negative.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment