Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:PacktPublishing LLM Engineers Handbook Cross Encoder Reranking

From Leeroopedia


Field Value
Concept Reranking retrieved candidates using a cross-encoder model
Category Retrieval / Reranking
Workflow RAG_Inference
Repository PacktPublishing/LLM-Engineers-Handbook
Implemented by Implementation:PacktPublishing_LLM_Engineers_Handbook_Reranker_Generate

Overview

Cross-Encoder Reranking is a two-stage retrieval approach where initial candidates from fast vector search are re-scored using a more accurate but slower cross-encoder model. Unlike bi-encoders (used for initial retrieval) which encode query and document independently, cross-encoders jointly encode the (query, document) pair, capturing fine-grained interactions. This significantly improves precision at the cost of being non-indexable, hence it is used only on a small candidate set.

Theory

Mathematical Basis

The cross-encoder produces a single relevance score for each (query, document) pair:

score = CrossEncoder(query, document) -> R

The top-K candidates are then selected by descending score.

Bi-Encoder vs. Cross-Encoder

Property Bi-Encoder Cross-Encoder
Encoding Query and document encoded independently Query and document encoded jointly
Interaction Late interaction (dot product / cosine) Early interaction (full self-attention)
Indexability Can pre-compute document embeddings Cannot pre-compute; requires query at inference
Speed Fast (sub-linear with ANN index) Slow (linear in number of candidates)
Accuracy Good Superior (captures cross-attention between query and document tokens)
Use case Initial retrieval over large collection Reranking a small candidate set

Two-Stage Retrieval

The two-stage approach combines the strengths of both models:

  1. Stage 1 (Recall) - The bi-encoder performs fast ANN search to retrieve a broad candidate set (e.g., top 50-100 documents)
  2. Stage 2 (Precision) - The cross-encoder re-scores the candidates with higher accuracy and selects the top-K (e.g., top 5-10) for final use

This achieves near-cross-encoder accuracy with near-bi-encoder latency.

When to Use

  • When improving retrieval precision by re-scoring initial vector search candidates
  • When the initial retrieval returns a manageable number of candidates (typically under 100)
  • When answer quality is more important than retrieval latency
  • When the application can afford the additional compute of running a cross-encoder model

Related Concepts

  • Multi-stage retrieval - cascading retrieval stages with increasing accuracy
  • ColBERT - late-interaction model that balances speed and accuracy
  • Two-tower models - architecture where query and document are encoded separately
  • Cross-attention - transformer mechanism that attends across two sequences

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment