Implementation:Togethercomputer Together python Rerank Create

Overview

Rerank Create implements the Principle:Togethercomputer_Together_python_Document_Reranking principle by providing the Rerank.create() method for reordering candidate documents by relevance to a query using a cross-encoder reranking model via the Together API.

API

Sync: Rerank.create(*, model, query, documents, top_n=None, return_documents=False, rank_fields=None, **kwargs) -> RerankResponse

Async: AsyncRerank.create(*, model, query, documents, top_n=None, return_documents=False, rank_fields=None, **kwargs) -> RerankResponse

Source

Sync implementation: src/together/resources/rerank.py:L19-70
Async implementation: src/together/resources/rerank.py:L77-128
Request type: src/together/types/rerank.py:L9-21
Response types: src/together/types/rerank.py:L24-43

Import

from together import Together

client = Together()
response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="What is machine learning?",
    documents=["ML is a subset of AI.", "The weather is nice today."],
)

Key Parameters

Parameter	Type	Default	Description
`model`	`str`	(required)	The name of the reranking model to use
`query`	`str`	(required)	The query string to rank documents against
`documents`	List[Dict[str, Any]]	(required)	List of documents to rerank (plain strings or structured dicts)
`top_n`	None	`None`	Number of top results to return (None returns all)
`return_documents`	`bool`	`False`	Whether to include document text in the response
`rank_fields`	None	`None`	Fields to use for ranking when documents are dicts

Inputs and Outputs

Input Type: RerankRequest

class RerankRequest(BaseModel):
    # model to query
    model: str
    # input query string
    query: str
    # list of documents (strings or dicts)
    documents: List[str] | List[Dict[str, Any]]
    # return top_n results
    top_n: int | None = None
    # boolean to return documents in response
    return_documents: bool = False
    # field selector for dict documents
    rank_fields: List[str] | None = None

Output Type: RerankResponse

class RerankResponse(BaseModel):
    # job id
    id: str | None = None
    # object type (always "rerank")
    object: Literal["rerank"] | None = None
    # query model
    model: str | None = None
    # list of reranked results (sorted by relevance_score descending)
    results: List[RerankChoicesData] | None = None
    # usage statistics
    usage: UsageData | None = None

Rerank Choice: RerankChoicesData

class RerankChoicesData(BaseModel):
    # original index of the document in the input list
    index: int
    # relevance score (higher is more relevant)
    relevance_score: float
    # document content (only present if return_documents=True)
    document: Dict[str, Any] | None = None

Usage Data: UsageData

class UsageData(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

Internal Flow

The create() method follows this sequence:

Constructs a RerankRequest from the provided parameters (model, query, documents, top_n, return_documents, rank_fields)
Serializes the request using .model_dump(exclude_none=True) to produce the API payload
Creates an APIRequestor with the client configuration
Sends a POST request to the rerank endpoint via requestor.request() (sync) or requestor.arequest() (async)
Deserializes the raw TogetherResponse into a RerankResponse object

Usage Examples

Basic Reranking with String Documents

from together import Together

client = Together()

query = "What is retrieval-augmented generation?"
documents = [
    "The weather forecast predicts rain tomorrow.",
    "RAG combines retrieval with generation for more accurate LLM responses.",
    "Python is a popular programming language.",
    "Retrieval-augmented generation uses external knowledge to improve LLM outputs.",
    "The stock market closed higher today.",
]

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=documents,
    top_n=3,
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
    print(f"  Document: {documents[result.index]}")

Reranking with Structured Documents

from together import Together

client = Together()

query = "deep learning frameworks"
documents = [
    {"title": "PyTorch Guide", "body": "PyTorch is an open-source deep learning framework."},
    {"title": "Cooking Recipes", "body": "Learn to make delicious pasta at home."},
    {"title": "TensorFlow Tutorial", "body": "TensorFlow provides tools for building neural networks."},
]

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=documents,
    rank_fields=["title", "body"],
    return_documents=True,
    top_n=2,
)

for result in response.results:
    print(f"Score: {result.relevance_score:.4f}")
    print(f"  Document: {result.document}")

Retrieve-Then-Rerank Pipeline

from together import Together
import numpy as np

client = Together()

# Step 1: Embed query and corpus
query = "How does attention work in transformers?"
corpus = [
    "Attention mechanisms allow models to focus on relevant parts of the input.",
    "Transformers use self-attention to process sequences in parallel.",
    "Convolutional neural networks use filters for feature extraction.",
    "The attention mechanism computes weighted sums of value vectors.",
    "Recurrent neural networks process sequences one step at a time.",
]

# Embed everything
all_texts = [query] + corpus
embed_response = client.embeddings.create(
    input=all_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval",
)

vectors = [np.array(item.embedding) for item in embed_response.data]
query_vec = vectors[0]
doc_vecs = vectors[1:]

# Step 2: Retrieve top candidates by cosine similarity
similarities = [
    np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv))
    for dv in doc_vecs
]
top_indices = np.argsort(similarities)[::-1][:3]
candidates = [corpus[i] for i in top_indices]

# Step 3: Rerank candidates
rerank_response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=candidates,
    top_n=2,
)

for result in rerank_response.results:
    print(f"Score: {result.relevance_score:.4f} -> {candidates[result.index]}")

Async Reranking

import asyncio
from together import AsyncTogether

async def rerank_documents():
    client = AsyncTogether()

    response = await client.rerank.create(
        model="Salesforce/Llama-Rank-V1",
        query="machine learning optimization",
        documents=[
            "Gradient descent is used to minimize loss functions.",
            "Tropical fish need warm water to survive.",
            "Adam optimizer adapts learning rates for each parameter.",
        ],
        top_n=2,
    )

    for result in response.results:
        print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")

asyncio.run(rerank_documents())

Metadata

Property	Value
Implementation	Rerank Create
API	`Rerank.create()` / `AsyncRerank.create()`
Source	`src/together/resources/rerank.py:L19-70` (sync), `L77-128` (async)
HTTP Method	POST
Endpoint	`rerank`
Domain	NLP, Information_Retrieval, RAG
Workflow	Embeddings_And_Reranking
Principle	Principle:Togethercomputer_Together_python_Document_Reranking

Knowledge Sources

2026-02-15 16:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment