Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python Rerank Create

From Leeroopedia
Revision as of 13:56, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Togethercomputer_Together_python_Rerank_Create.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Rerank Create implements the Principle:Togethercomputer_Together_python_Document_Reranking principle by providing the Rerank.create() method for reordering candidate documents by relevance to a query using a cross-encoder reranking model via the Together API.

API

Sync: Rerank.create(*, model, query, documents, top_n=None, return_documents=False, rank_fields=None, **kwargs) -> RerankResponse

Async: AsyncRerank.create(*, model, query, documents, top_n=None, return_documents=False, rank_fields=None, **kwargs) -> RerankResponse

Source

  • Sync implementation: src/together/resources/rerank.py:L19-70
  • Async implementation: src/together/resources/rerank.py:L77-128
  • Request type: src/together/types/rerank.py:L9-21
  • Response types: src/together/types/rerank.py:L24-43

Import

from together import Together

client = Together()
response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="What is machine learning?",
    documents=["ML is a subset of AI.", "The weather is nice today."],
)

Key Parameters

Parameter Type Default Description
model str (required) The name of the reranking model to use
query str (required) The query string to rank documents against
documents List[Dict[str, Any]] (required) List of documents to rerank (plain strings or structured dicts)
top_n None None Number of top results to return (None returns all)
return_documents bool False Whether to include document text in the response
rank_fields None None Fields to use for ranking when documents are dicts

Inputs and Outputs

Input Type: RerankRequest

class RerankRequest(BaseModel):
    # model to query
    model: str
    # input query string
    query: str
    # list of documents (strings or dicts)
    documents: List[str] | List[Dict[str, Any]]
    # return top_n results
    top_n: int | None = None
    # boolean to return documents in response
    return_documents: bool = False
    # field selector for dict documents
    rank_fields: List[str] | None = None

Output Type: RerankResponse

class RerankResponse(BaseModel):
    # job id
    id: str | None = None
    # object type (always "rerank")
    object: Literal["rerank"] | None = None
    # query model
    model: str | None = None
    # list of reranked results (sorted by relevance_score descending)
    results: List[RerankChoicesData] | None = None
    # usage statistics
    usage: UsageData | None = None

Rerank Choice: RerankChoicesData

class RerankChoicesData(BaseModel):
    # original index of the document in the input list
    index: int
    # relevance score (higher is more relevant)
    relevance_score: float
    # document content (only present if return_documents=True)
    document: Dict[str, Any] | None = None

Usage Data: UsageData

class UsageData(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

Internal Flow

The create() method follows this sequence:

  1. Constructs a RerankRequest from the provided parameters (model, query, documents, top_n, return_documents, rank_fields)
  2. Serializes the request using .model_dump(exclude_none=True) to produce the API payload
  3. Creates an APIRequestor with the client configuration
  4. Sends a POST request to the rerank endpoint via requestor.request() (sync) or requestor.arequest() (async)
  5. Deserializes the raw TogetherResponse into a RerankResponse object

Usage Examples

Basic Reranking with String Documents

from together import Together

client = Together()

query = "What is retrieval-augmented generation?"
documents = [
    "The weather forecast predicts rain tomorrow.",
    "RAG combines retrieval with generation for more accurate LLM responses.",
    "Python is a popular programming language.",
    "Retrieval-augmented generation uses external knowledge to improve LLM outputs.",
    "The stock market closed higher today.",
]

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=documents,
    top_n=3,
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
    print(f"  Document: {documents[result.index]}")

Reranking with Structured Documents

from together import Together

client = Together()

query = "deep learning frameworks"
documents = [
    {"title": "PyTorch Guide", "body": "PyTorch is an open-source deep learning framework."},
    {"title": "Cooking Recipes", "body": "Learn to make delicious pasta at home."},
    {"title": "TensorFlow Tutorial", "body": "TensorFlow provides tools for building neural networks."},
]

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=documents,
    rank_fields=["title", "body"],
    return_documents=True,
    top_n=2,
)

for result in response.results:
    print(f"Score: {result.relevance_score:.4f}")
    print(f"  Document: {result.document}")

Retrieve-Then-Rerank Pipeline

from together import Together
import numpy as np

client = Together()

# Step 1: Embed query and corpus
query = "How does attention work in transformers?"
corpus = [
    "Attention mechanisms allow models to focus on relevant parts of the input.",
    "Transformers use self-attention to process sequences in parallel.",
    "Convolutional neural networks use filters for feature extraction.",
    "The attention mechanism computes weighted sums of value vectors.",
    "Recurrent neural networks process sequences one step at a time.",
]

# Embed everything
all_texts = [query] + corpus
embed_response = client.embeddings.create(
    input=all_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval",
)

vectors = [np.array(item.embedding) for item in embed_response.data]
query_vec = vectors[0]
doc_vecs = vectors[1:]

# Step 2: Retrieve top candidates by cosine similarity
similarities = [
    np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv))
    for dv in doc_vecs
]
top_indices = np.argsort(similarities)[::-1][:3]
candidates = [corpus[i] for i in top_indices]

# Step 3: Rerank candidates
rerank_response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=candidates,
    top_n=2,
)

for result in rerank_response.results:
    print(f"Score: {result.relevance_score:.4f} -> {candidates[result.index]}")

Async Reranking

import asyncio
from together import AsyncTogether

async def rerank_documents():
    client = AsyncTogether()

    response = await client.rerank.create(
        model="Salesforce/Llama-Rank-V1",
        query="machine learning optimization",
        documents=[
            "Gradient descent is used to minimize loss functions.",
            "Tropical fish need warm water to survive.",
            "Adam optimizer adapts learning rates for each parameter.",
        ],
        top_n=2,
    )

    for result in response.results:
        print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")

asyncio.run(rerank_documents())

Metadata

Property Value
Implementation Rerank Create
API Rerank.create() / AsyncRerank.create()
Source src/together/resources/rerank.py:L19-70 (sync), L77-128 (async)
HTTP Method POST
Endpoint rerank
Domain NLP, Information_Retrieval, RAG
Workflow Embeddings_And_Reranking
Principle Principle:Togethercomputer_Together_python_Document_Reranking

Knowledge Sources

2026-02-15 16:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment