Principle:Togethercomputer Together python Document Reranking
Overview
Document Reranking is the mechanism for reordering candidate documents by relevance to a query using a cross-encoder reranking model via the Together Python SDK.
Description
Document reranking takes a query and a set of candidate documents and produces a relevance-scored ordering. Unlike embedding-based retrieval (which uses a bi-encoder to independently encode queries and documents), reranking uses cross-encoders that jointly process query-document pairs for more accurate relevance assessment. This joint processing allows the model to capture fine-grained interactions between the query and document that bi-encoders miss.
Reranking is typically used as a second stage after initial embedding-based retrieval. The first stage casts a wide net by retrieving a larger set of candidates from a vector database using embedding similarity. The second stage (reranking) refines this set by applying a more expensive but more accurate cross-encoder model to reorder the candidates by true relevance.
Usage
Use document reranking to improve the precision of retrieval results. The typical workflow is:
- Retrieve initial candidates from a vector database using embedding similarity (see Principle:Togethercomputer_Together_python_Embedding_Generation)
- Pass the query and candidate documents to
client.rerank.create() - Use the reranked, relevance-scored results for downstream tasks (e.g., feeding context into an LLM for RAG)
Common scenarios include:
- RAG pipelines -- Rerank retrieved passages before feeding them as context to an LLM, improving generation quality
- Search applications -- Improve search result ordering beyond what embedding-based retrieval provides
- Document filtering -- Use relevance scores to threshold out irrelevant documents from the candidate set
- Multi-stage retrieval -- Combine fast embedding-based first-pass retrieval with accurate cross-encoder reranking
Theoretical Basis
Cross-encoder reranking jointly encodes the query and document together, producing a single relevance score. The key theoretical considerations are:
- Cross-encoder vs. bi-encoder -- Bi-encoders encode query and document independently, which is efficient (O(n) for n documents) but misses cross-attention between query and document tokens. Cross-encoders process the concatenated query-document pair, capturing fine-grained token interactions (O(n*k) where k is query length). This makes cross-encoders more accurate but computationally more expensive.
- Retrieve-then-rerank pipeline -- The standard approach combines the efficiency of bi-encoders with the accuracy of cross-encoders. A bi-encoder retrieves the top-k candidates (e.g., top-100) from a large corpus, and a cross-encoder reranks this smaller set. This achieves near-exhaustive accuracy at a fraction of the cost.
- Relevance scoring -- The cross-encoder produces a relevance score (typically 0 to 1) for each query-document pair. These scores are calibrated to be comparable across documents for the same query, enabling score-based thresholding and confidence-based filtering.
- Field-level ranking -- When documents are structured (e.g., dictionaries with multiple fields), reranking can be focused on specific fields (e.g., title, abstract, body) using
rank_fields. This allows the model to score relevance based on the most informative parts of the document.
- Top-N selection -- The
top_nparameter allows the API to return only the most relevant results, reducing response payload size and simplifying downstream processing.
Metadata
| Property | Value |
|---|---|
| Principle | Document Reranking |
| Domain | NLP, Information_Retrieval, RAG |
| Workflow | Embeddings_And_Reranking |
| Related Concepts | Cross-Encoder, Bi-Encoder, Retrieve-Then-Rerank, Relevance Scoring |
| Implementation | Implementation:Togethercomputer_Together_python_Rerank_Create |