Implementation:PacktPublishing LLM Engineers Handbook Reranker Generate
Appearance
| Field | Value |
|---|---|
| Type | API Doc |
| Workflow | RAG_Inference |
| Repository | PacktPublishing/LLM-Engineers-Handbook |
| Source | reranking.py:L16-30 |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Cross_Encoder_Reranking |
API Signature
Reranker.generate(
self,
query: Query,
chunks: list[EmbeddedChunk],
keep_top_k: int
) -> list[EmbeddedChunk]
Import
from llm_engineering.application.rag.reranking import Reranker
Key Code
class Reranker(RAGStep):
@opik.track(name="Reranker.generate")
def generate(
self,
query: Query,
chunks: list[EmbeddedChunk],
keep_top_k: int,
) -> list[EmbeddedChunk]:
model = CrossEncoderModelSingleton()
query_doc_pairs = [(query.content, chunk.content) for chunk in chunks]
scores = model.predict(query_doc_pairs)
scored_chunks = list(zip(scores, chunks))
scored_chunks.sort(key=lambda x: x[0], reverse=True)
return [chunk for _, chunk in scored_chunks[:keep_top_k]]
Parameters
| Parameter | Type | Description |
|---|---|---|
| query | Query | The user query used for scoring relevance |
| chunks | list[EmbeddedChunk] | The candidate chunks retrieved from vector search |
| keep_top_k | int | Number of top-scoring chunks to return |
Inputs and Outputs
Inputs:
- query: Query - The user query object (uses
query.contentfor scoring) - chunks: list[EmbeddedChunk] - Candidate document chunks from vector search
- keep_top_k: int - The number of highest-scoring chunks to retain
Outputs:
- list[EmbeddedChunk] - The top-K chunks re-ranked by cross-encoder relevance score, sorted in descending order of relevance
How It Works
- A CrossEncoderModelSingleton is instantiated (or retrieved from the singleton cache), loading the cross-encoder model once and reusing it
- Query-document pairs are constructed by pairing the query content with each chunk's content
- The cross-encoder model predicts relevance scores for all pairs in a batch
- Scores are zipped with their corresponding chunks
- The scored chunks are sorted in descending order by score
- The top-K chunks are returned, discarding lower-scoring candidates
External Dependencies
- sentence_transformers.cross_encoder (via CrossEncoderModelSingleton) - The cross-encoder model for computing relevance scores
- opik - Observability and tracing decorator
Source File
llm_engineering/application/rag/reranking.py(lines 16-30)
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment