Principle:FlagOpen FlagEmbedding Cross Encoder Reranking

Overview

A technique that uses cross-attention between query and passage tokens to compute fine-grained relevance scores, providing higher accuracy than bi-encoder similarity.

Description

Cross-encoder reranking concatenates query and passage into a single sequence, allowing full token-level attention between them. This captures subtle semantic relationships that bi-encoders miss. FlagEmbedding supports four reranker architectures:

Encoder-only using sequence classification heads
Decoder-only LLM-based using next-token prediction
Layerwise extracting scores from multiple transformer layers
Lightweight with token compression

Multi-GPU support via process pools for batch scoring.

Usage

When re-scoring candidate passages retrieved by a first-stage bi-encoder to improve ranking quality.

Theoretical Basis

Cross-attention allows O(n*m) token interactions vs O(n+m) for bi-encoders. The score is computed as:

Encoder-only: sigmoid(cls_logit)
Decoder-only: P("Yes"|[query, passage, prompt])
Layerwise: mean(layer_scores[cutoff_layers])
Lightweight: score(compressed_passage_tokens)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment