Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PacktPublishing LLM Engineers Handbook Reranker Generate

From Leeroopedia


Field Value
Type API Doc
Workflow RAG_Inference
Repository PacktPublishing/LLM-Engineers-Handbook
Source reranking.py:L16-30
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_Cross_Encoder_Reranking

API Signature

Reranker.generate(
    self,
    query: Query,
    chunks: list[EmbeddedChunk],
    keep_top_k: int
) -> list[EmbeddedChunk]

Import

from llm_engineering.application.rag.reranking import Reranker

Key Code

class Reranker(RAGStep):
    @opik.track(name="Reranker.generate")
    def generate(
        self,
        query: Query,
        chunks: list[EmbeddedChunk],
        keep_top_k: int,
    ) -> list[EmbeddedChunk]:
        model = CrossEncoderModelSingleton()
        query_doc_pairs = [(query.content, chunk.content) for chunk in chunks]
        scores = model.predict(query_doc_pairs)
        scored_chunks = list(zip(scores, chunks))
        scored_chunks.sort(key=lambda x: x[0], reverse=True)
        return [chunk for _, chunk in scored_chunks[:keep_top_k]]

Parameters

Parameter Type Description
query Query The user query used for scoring relevance
chunks list[EmbeddedChunk] The candidate chunks retrieved from vector search
keep_top_k int Number of top-scoring chunks to return

Inputs and Outputs

Inputs:

  • query: Query - The user query object (uses query.content for scoring)
  • chunks: list[EmbeddedChunk] - Candidate document chunks from vector search
  • keep_top_k: int - The number of highest-scoring chunks to retain

Outputs:

  • list[EmbeddedChunk] - The top-K chunks re-ranked by cross-encoder relevance score, sorted in descending order of relevance

How It Works

  1. A CrossEncoderModelSingleton is instantiated (or retrieved from the singleton cache), loading the cross-encoder model once and reusing it
  2. Query-document pairs are constructed by pairing the query content with each chunk's content
  3. The cross-encoder model predicts relevance scores for all pairs in a batch
  4. Scores are zipped with their corresponding chunks
  5. The scored chunks are sorted in descending order by score
  6. The top-K chunks are returned, discarding lower-scoring candidates

External Dependencies

  • sentence_transformers.cross_encoder (via CrossEncoderModelSingleton) - The cross-encoder model for computing relevance scores
  • opik - Observability and tracing decorator

Source File

  • llm_engineering/application/rag/reranking.py (lines 16-30)

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment