Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python Result Integration Pattern

From Leeroopedia

Overview

The Result Integration Pattern implements the Principle:Togethercomputer_Together_python_Result_Integration principle by providing user-defined logic for combining embedding and reranking results into a final ranked document list for downstream application use.

This is a Pattern Doc -- the Together Python SDK does not provide built-in result integration utilities. Instead, this documents the recommended patterns for post-processing retrieval and reranking outputs on the user side.

Pattern Structure

The result integration pattern follows these stages:

  1. Map -- Map reranked indices back to original document objects and metadata
  2. Filter -- Apply relevance score thresholds to remove low-confidence results
  3. Combine -- Optionally fuse scores from multiple retrieval signals
  4. Select -- Choose the final set of documents respecting context window or display constraints

Example Patterns

Mapping Reranked Indices to Original Documents

from together import Together

client = Together()

# Original documents with metadata
documents = [
    {"id": "doc_1", "title": "RAG Overview", "text": "RAG combines retrieval with generation..."},
    {"id": "doc_2", "title": "Weather Report", "text": "The forecast shows sunny skies..."},
    {"id": "doc_3", "title": "LLM Techniques", "text": "Retrieval-augmented generation improves..."},
    {"id": "doc_4", "title": "Cooking Tips", "text": "Use fresh ingredients for best results..."},
    {"id": "doc_5", "title": "NLP Methods", "text": "Dense retrieval uses embeddings to find..."},
]

# Rerank using document text
response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="How does RAG improve LLM accuracy?",
    documents=[doc["text"] for doc in documents],
    top_n=3,
)

# Map reranked indices back to full document objects
ranked_documents = []
for result in response.results:
    original_doc = documents[result.index]
    ranked_documents.append({
        **original_doc,
        "relevance_score": result.relevance_score,
    })

for doc in ranked_documents:
    print(f"[{doc['relevance_score']:.4f}] {doc['id']}: {doc['title']}")

Relevance Score Thresholding

def filter_by_relevance(rerank_results, documents, threshold=0.5):
    """Filter reranked results by a minimum relevance score.

    Args:
        rerank_results: The results list from RerankResponse.
        documents: The original documents list (same order as rerank input).
        threshold: Minimum relevance score to include a document.

    Returns:
        List of (document, score) tuples above the threshold.
    """
    filtered = []
    for result in rerank_results:
        if result.relevance_score >= threshold:
            filtered.append((documents[result.index], result.relevance_score))
    return filtered


# Usage
response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="machine learning optimization techniques",
    documents=document_texts,
)

relevant_docs = filter_by_relevance(response.results, original_documents, threshold=0.3)
print(f"Kept {len(relevant_docs)} of {len(document_texts)} documents above threshold")

Reciprocal Rank Fusion (Combining Multiple Signals)

def reciprocal_rank_fusion(ranked_lists: list[list[int]], k: int = 60) -> list[tuple[int, float]]:
    """Combine multiple ranked lists using Reciprocal Rank Fusion (RRF).

    Args:
        ranked_lists: List of ranked lists, where each list contains document indices
                      ordered by relevance (most relevant first).
        k: RRF constant (default 60, as recommended in the original paper).

    Returns:
        List of (doc_index, rrf_score) tuples sorted by RRF score descending.
    """
    rrf_scores = {}
    for ranked_list in ranked_lists:
        for rank, doc_index in enumerate(ranked_list):
            if doc_index not in rrf_scores:
                rrf_scores[doc_index] = 0.0
            rrf_scores[doc_index] += 1.0 / (k + rank + 1)

    # Sort by RRF score descending
    return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)


# Example: combine embedding retrieval with reranking
import numpy as np
from together import Together

client = Together()
query = "How does attention work in transformers?"
corpus = [
    "Attention mechanisms allow models to focus on relevant input parts.",
    "Transformers use self-attention for parallel sequence processing.",
    "CNNs use convolutional filters for feature extraction.",
    "The attention mechanism computes weighted sums of value vectors.",
    "RNNs process sequences one step at a time.",
]

# Signal 1: Embedding-based ranking
all_texts = [query] + corpus
embed_resp = client.embeddings.create(
    input=all_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval",
)
vectors = [np.array(item.embedding) for item in embed_resp.data]
query_vec = vectors[0]
doc_vecs = vectors[1:]
similarities = [
    np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv))
    for dv in doc_vecs
]
embedding_ranking = list(np.argsort(similarities)[::-1])

# Signal 2: Reranking-based ordering
rerank_resp = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=corpus,
)
rerank_ranking = [result.index for result in rerank_resp.results]

# Fuse the two signals
fused = reciprocal_rank_fusion([embedding_ranking, rerank_ranking])

print("Fused ranking:")
for doc_index, rrf_score in fused[:3]:
    print(f"  RRF={rrf_score:.4f}: {corpus[doc_index]}")

Building RAG Context from Reranked Results

def build_rag_context(
    rerank_results,
    documents: list[str],
    max_context_chars: int = 4000,
    min_score: float = 0.2,
) -> str:
    """Build a RAG context string from reranked results.

    Selects documents by relevance score and fits them within a character budget.

    Args:
        rerank_results: The results list from RerankResponse.
        documents: The original documents list.
        max_context_chars: Maximum total characters for the context.
        min_score: Minimum relevance score to include a document.

    Returns:
        Concatenated context string with document separators.
    """
    context_parts = []
    total_chars = 0

    for result in rerank_results:
        if result.relevance_score < min_score:
            continue

        doc_text = documents[result.index]

        # Check if adding this document exceeds the budget
        separator = f"\n\n---\n[Document {result.index} | Relevance: {result.relevance_score:.3f}]\n"
        addition_length = len(separator) + len(doc_text)

        if total_chars + addition_length > max_context_chars:
            # Truncate the last document to fit
            remaining = max_context_chars - total_chars - len(separator)
            if remaining > 100:  # Only include if meaningful content fits
                context_parts.append(separator + doc_text[:remaining] + "...")
            break

        context_parts.append(separator + doc_text)
        total_chars += addition_length

    return "".join(context_parts).strip()


# Usage in a RAG pipeline
from together import Together

client = Together()

query = "Explain retrieval-augmented generation"
candidate_docs = [...]  # Retrieved from vector database

# Rerank candidates
rerank_resp = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=candidate_docs,
    top_n=5,
)

# Build context for LLM
context = build_rag_context(rerank_resp.results, candidate_docs, max_context_chars=3000)

# Use context in LLM prompt
chat_response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": f"Answer based on this context:\n{context}"},
        {"role": "user", "content": query},
    ],
)

print(chat_response.choices[0].message.content)

Diversity-Aware Selection (Maximal Marginal Relevance)

import numpy as np

def mmr_select(
    query_embedding: np.ndarray,
    doc_embeddings: list[np.ndarray],
    relevance_scores: list[float],
    k: int = 5,
    lambda_param: float = 0.7,
) -> list[int]:
    """Select documents using Maximal Marginal Relevance (MMR).

    Balances relevance with diversity to avoid redundant results.

    Args:
        query_embedding: The query embedding vector.
        doc_embeddings: List of document embedding vectors.
        relevance_scores: Relevance scores from reranking.
        k: Number of documents to select.
        lambda_param: Trade-off between relevance (1.0) and diversity (0.0).

    Returns:
        List of selected document indices.
    """
    selected = []
    candidates = list(range(len(doc_embeddings)))

    for _ in range(min(k, len(candidates))):
        best_score = -float("inf")
        best_idx = -1

        for idx in candidates:
            if idx in selected:
                continue

            # Relevance component
            relevance = relevance_scores[idx]

            # Diversity component: max similarity to already-selected docs
            if selected:
                similarities = [
                    np.dot(doc_embeddings[idx], doc_embeddings[s]) / (
                        np.linalg.norm(doc_embeddings[idx]) * np.linalg.norm(doc_embeddings[s])
                    )
                    for s in selected
                ]
                max_sim = max(similarities)
            else:
                max_sim = 0.0

            # MMR score
            mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim

            if mmr_score > best_score:
                best_score = mmr_score
                best_idx = idx

        if best_idx >= 0:
            selected.append(best_idx)
            candidates.remove(best_idx)

    return selected

Design Decisions

Decision Recommendation Rationale
Score threshold value Start with 0.2-0.5, tune per model and use case Thresholds are model-dependent; lower thresholds for recall-oriented tasks, higher for precision-oriented
RRF constant (k) Use k=60 as default Recommended by the original RRF paper; provides stable fusion across ranking depths
Context window budget Reserve 20-30% of total context for the query and instructions Ensures the LLM has room for the user query and system instructions alongside retrieved context
Relevance vs. diversity trade-off lambda=0.7 for most RAG tasks Biases toward relevance while still penalizing highly redundant documents

Source Files

This is a user-side pattern. No SDK source files implement this functionality. The relevant SDK outputs consumed by this pattern are:

  • src/together/types/embeddings.py -- EmbeddingResponse, EmbeddingChoicesData
  • src/together/types/rerank.py -- RerankResponse, RerankChoicesData

Metadata

Property Value
Implementation Result Integration Pattern
Type Pattern Doc (user-defined logic)
Domain NLP, Information_Retrieval, RAG
Workflow Embeddings_And_Reranking
Principle Principle:Togethercomputer_Together_python_Result_Integration

Knowledge Sources

2026-02-15 16:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment