Principle:Togethercomputer Together python Document Reranking

Overview

Document Reranking is the mechanism for reordering candidate documents by relevance to a query using a cross-encoder reranking model via the Together Python SDK.

Description

Document reranking takes a query and a set of candidate documents and produces a relevance-scored ordering. Unlike embedding-based retrieval (which uses a bi-encoder to independently encode queries and documents), reranking uses cross-encoders that jointly process query-document pairs for more accurate relevance assessment. This joint processing allows the model to capture fine-grained interactions between the query and document that bi-encoders miss.

Reranking is typically used as a second stage after initial embedding-based retrieval. The first stage casts a wide net by retrieving a larger set of candidates from a vector database using embedding similarity. The second stage (reranking) refines this set by applying a more expensive but more accurate cross-encoder model to reorder the candidates by true relevance.

Usage

Use document reranking to improve the precision of retrieval results. The typical workflow is:

Retrieve initial candidates from a vector database using embedding similarity (see Principle:Togethercomputer_Together_python_Embedding_Generation)
Pass the query and candidate documents to client.rerank.create()
Use the reranked, relevance-scored results for downstream tasks (e.g., feeding context into an LLM for RAG)

Common scenarios include:

RAG pipelines -- Rerank retrieved passages before feeding them as context to an LLM, improving generation quality
Search applications -- Improve search result ordering beyond what embedding-based retrieval provides
Document filtering -- Use relevance scores to threshold out irrelevant documents from the candidate set
Multi-stage retrieval -- Combine fast embedding-based first-pass retrieval with accurate cross-encoder reranking

Theoretical Basis

Cross-encoder reranking jointly encodes the query and document together, producing a single relevance score. The key theoretical considerations are:

Cross-encoder vs. bi-encoder -- Bi-encoders encode query and document independently, which is efficient (O(n) for n documents) but misses cross-attention between query and document tokens. Cross-encoders process the concatenated query-document pair, capturing fine-grained token interactions (O(n*k) where k is query length). This makes cross-encoders more accurate but computationally more expensive.

Retrieve-then-rerank pipeline -- The standard approach combines the efficiency of bi-encoders with the accuracy of cross-encoders. A bi-encoder retrieves the top-k candidates (e.g., top-100) from a large corpus, and a cross-encoder reranks this smaller set. This achieves near-exhaustive accuracy at a fraction of the cost.

Relevance scoring -- The cross-encoder produces a relevance score (typically 0 to 1) for each query-document pair. These scores are calibrated to be comparable across documents for the same query, enabling score-based thresholding and confidence-based filtering.

Field-level ranking -- When documents are structured (e.g., dictionaries with multiple fields), reranking can be focused on specific fields (e.g., title, abstract, body) using rank_fields. This allows the model to score relevance based on the most informative parts of the document.

Top-N selection -- The top_n parameter allows the API to return only the most relevant results, reducing response payload size and simplifying downstream processing.

Metadata

Property	Value
Principle	Document Reranking
Domain	NLP, Information_Retrieval, RAG
Workflow	Embeddings_And_Reranking
Related Concepts	Cross-Encoder, Bi-Encoder, Retrieve-Then-Rerank, Relevance Scoring
Implementation	Implementation:Togethercomputer_Together_python_Rerank_Create

Knowledge Sources

2026-02-15 16:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment