Principle:FlagOpen FlagEmbedding Cross Encoder Reranking
Overview
A technique that uses cross-attention between query and passage tokens to compute fine-grained relevance scores, providing higher accuracy than bi-encoder similarity.
Description
Cross-encoder reranking concatenates query and passage into a single sequence, allowing full token-level attention between them. This captures subtle semantic relationships that bi-encoders miss. FlagEmbedding supports four reranker architectures:
- Encoder-only using sequence classification heads
- Decoder-only LLM-based using next-token prediction
- Layerwise extracting scores from multiple transformer layers
- Lightweight with token compression
Multi-GPU support via process pools for batch scoring.
Usage
When re-scoring candidate passages retrieved by a first-stage bi-encoder to improve ranking quality.
Theoretical Basis
Cross-attention allows O(n*m) token interactions vs O(n+m) for bi-encoders. The score is computed as:
- Encoder-only:
sigmoid(cls_logit) - Decoder-only:
P("Yes"|[query, passage, prompt]) - Layerwise:
mean(layer_scores[cutoff_layers]) - Lightweight:
score(compressed_passage_tokens)
Related Pages
Implementation:FlagOpen_FlagEmbedding_AbsReranker_Compute_Score