Implementation:Togethercomputer Together python Result Integration Pattern
Overview
The Result Integration Pattern implements the Principle:Togethercomputer_Together_python_Result_Integration principle by providing user-defined logic for combining embedding and reranking results into a final ranked document list for downstream application use.
This is a Pattern Doc -- the Together Python SDK does not provide built-in result integration utilities. Instead, this documents the recommended patterns for post-processing retrieval and reranking outputs on the user side.
Pattern Structure
The result integration pattern follows these stages:
- Map -- Map reranked indices back to original document objects and metadata
- Filter -- Apply relevance score thresholds to remove low-confidence results
- Combine -- Optionally fuse scores from multiple retrieval signals
- Select -- Choose the final set of documents respecting context window or display constraints
Example Patterns
Mapping Reranked Indices to Original Documents
from together import Together
client = Together()
# Original documents with metadata
documents = [
{"id": "doc_1", "title": "RAG Overview", "text": "RAG combines retrieval with generation..."},
{"id": "doc_2", "title": "Weather Report", "text": "The forecast shows sunny skies..."},
{"id": "doc_3", "title": "LLM Techniques", "text": "Retrieval-augmented generation improves..."},
{"id": "doc_4", "title": "Cooking Tips", "text": "Use fresh ingredients for best results..."},
{"id": "doc_5", "title": "NLP Methods", "text": "Dense retrieval uses embeddings to find..."},
]
# Rerank using document text
response = client.rerank.create(
model="Salesforce/Llama-Rank-V1",
query="How does RAG improve LLM accuracy?",
documents=[doc["text"] for doc in documents],
top_n=3,
)
# Map reranked indices back to full document objects
ranked_documents = []
for result in response.results:
original_doc = documents[result.index]
ranked_documents.append({
**original_doc,
"relevance_score": result.relevance_score,
})
for doc in ranked_documents:
print(f"[{doc['relevance_score']:.4f}] {doc['id']}: {doc['title']}")
Relevance Score Thresholding
def filter_by_relevance(rerank_results, documents, threshold=0.5):
"""Filter reranked results by a minimum relevance score.
Args:
rerank_results: The results list from RerankResponse.
documents: The original documents list (same order as rerank input).
threshold: Minimum relevance score to include a document.
Returns:
List of (document, score) tuples above the threshold.
"""
filtered = []
for result in rerank_results:
if result.relevance_score >= threshold:
filtered.append((documents[result.index], result.relevance_score))
return filtered
# Usage
response = client.rerank.create(
model="Salesforce/Llama-Rank-V1",
query="machine learning optimization techniques",
documents=document_texts,
)
relevant_docs = filter_by_relevance(response.results, original_documents, threshold=0.3)
print(f"Kept {len(relevant_docs)} of {len(document_texts)} documents above threshold")
Reciprocal Rank Fusion (Combining Multiple Signals)
def reciprocal_rank_fusion(ranked_lists: list[list[int]], k: int = 60) -> list[tuple[int, float]]:
"""Combine multiple ranked lists using Reciprocal Rank Fusion (RRF).
Args:
ranked_lists: List of ranked lists, where each list contains document indices
ordered by relevance (most relevant first).
k: RRF constant (default 60, as recommended in the original paper).
Returns:
List of (doc_index, rrf_score) tuples sorted by RRF score descending.
"""
rrf_scores = {}
for ranked_list in ranked_lists:
for rank, doc_index in enumerate(ranked_list):
if doc_index not in rrf_scores:
rrf_scores[doc_index] = 0.0
rrf_scores[doc_index] += 1.0 / (k + rank + 1)
# Sort by RRF score descending
return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
# Example: combine embedding retrieval with reranking
import numpy as np
from together import Together
client = Together()
query = "How does attention work in transformers?"
corpus = [
"Attention mechanisms allow models to focus on relevant input parts.",
"Transformers use self-attention for parallel sequence processing.",
"CNNs use convolutional filters for feature extraction.",
"The attention mechanism computes weighted sums of value vectors.",
"RNNs process sequences one step at a time.",
]
# Signal 1: Embedding-based ranking
all_texts = [query] + corpus
embed_resp = client.embeddings.create(
input=all_texts,
model="togethercomputer/m2-bert-80M-8k-retrieval",
)
vectors = [np.array(item.embedding) for item in embed_resp.data]
query_vec = vectors[0]
doc_vecs = vectors[1:]
similarities = [
np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv))
for dv in doc_vecs
]
embedding_ranking = list(np.argsort(similarities)[::-1])
# Signal 2: Reranking-based ordering
rerank_resp = client.rerank.create(
model="Salesforce/Llama-Rank-V1",
query=query,
documents=corpus,
)
rerank_ranking = [result.index for result in rerank_resp.results]
# Fuse the two signals
fused = reciprocal_rank_fusion([embedding_ranking, rerank_ranking])
print("Fused ranking:")
for doc_index, rrf_score in fused[:3]:
print(f" RRF={rrf_score:.4f}: {corpus[doc_index]}")
Building RAG Context from Reranked Results
def build_rag_context(
rerank_results,
documents: list[str],
max_context_chars: int = 4000,
min_score: float = 0.2,
) -> str:
"""Build a RAG context string from reranked results.
Selects documents by relevance score and fits them within a character budget.
Args:
rerank_results: The results list from RerankResponse.
documents: The original documents list.
max_context_chars: Maximum total characters for the context.
min_score: Minimum relevance score to include a document.
Returns:
Concatenated context string with document separators.
"""
context_parts = []
total_chars = 0
for result in rerank_results:
if result.relevance_score < min_score:
continue
doc_text = documents[result.index]
# Check if adding this document exceeds the budget
separator = f"\n\n---\n[Document {result.index} | Relevance: {result.relevance_score:.3f}]\n"
addition_length = len(separator) + len(doc_text)
if total_chars + addition_length > max_context_chars:
# Truncate the last document to fit
remaining = max_context_chars - total_chars - len(separator)
if remaining > 100: # Only include if meaningful content fits
context_parts.append(separator + doc_text[:remaining] + "...")
break
context_parts.append(separator + doc_text)
total_chars += addition_length
return "".join(context_parts).strip()
# Usage in a RAG pipeline
from together import Together
client = Together()
query = "Explain retrieval-augmented generation"
candidate_docs = [...] # Retrieved from vector database
# Rerank candidates
rerank_resp = client.rerank.create(
model="Salesforce/Llama-Rank-V1",
query=query,
documents=candidate_docs,
top_n=5,
)
# Build context for LLM
context = build_rag_context(rerank_resp.results, candidate_docs, max_context_chars=3000)
# Use context in LLM prompt
chat_response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": query},
],
)
print(chat_response.choices[0].message.content)
Diversity-Aware Selection (Maximal Marginal Relevance)
import numpy as np
def mmr_select(
query_embedding: np.ndarray,
doc_embeddings: list[np.ndarray],
relevance_scores: list[float],
k: int = 5,
lambda_param: float = 0.7,
) -> list[int]:
"""Select documents using Maximal Marginal Relevance (MMR).
Balances relevance with diversity to avoid redundant results.
Args:
query_embedding: The query embedding vector.
doc_embeddings: List of document embedding vectors.
relevance_scores: Relevance scores from reranking.
k: Number of documents to select.
lambda_param: Trade-off between relevance (1.0) and diversity (0.0).
Returns:
List of selected document indices.
"""
selected = []
candidates = list(range(len(doc_embeddings)))
for _ in range(min(k, len(candidates))):
best_score = -float("inf")
best_idx = -1
for idx in candidates:
if idx in selected:
continue
# Relevance component
relevance = relevance_scores[idx]
# Diversity component: max similarity to already-selected docs
if selected:
similarities = [
np.dot(doc_embeddings[idx], doc_embeddings[s]) / (
np.linalg.norm(doc_embeddings[idx]) * np.linalg.norm(doc_embeddings[s])
)
for s in selected
]
max_sim = max(similarities)
else:
max_sim = 0.0
# MMR score
mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
if mmr_score > best_score:
best_score = mmr_score
best_idx = idx
if best_idx >= 0:
selected.append(best_idx)
candidates.remove(best_idx)
return selected
Design Decisions
| Decision | Recommendation | Rationale |
|---|---|---|
| Score threshold value | Start with 0.2-0.5, tune per model and use case | Thresholds are model-dependent; lower thresholds for recall-oriented tasks, higher for precision-oriented |
| RRF constant (k) | Use k=60 as default | Recommended by the original RRF paper; provides stable fusion across ranking depths |
| Context window budget | Reserve 20-30% of total context for the query and instructions | Ensures the LLM has room for the user query and system instructions alongside retrieved context |
| Relevance vs. diversity trade-off | lambda=0.7 for most RAG tasks | Biases toward relevance while still penalizing highly redundant documents |
Source Files
This is a user-side pattern. No SDK source files implement this functionality. The relevant SDK outputs consumed by this pattern are:
src/together/types/embeddings.py--EmbeddingResponse,EmbeddingChoicesDatasrc/together/types/rerank.py--RerankResponse,RerankChoicesData
Metadata
| Property | Value |
|---|---|
| Implementation | Result Integration Pattern |
| Type | Pattern Doc (user-defined logic) |
| Domain | NLP, Information_Retrieval, RAG |
| Workflow | Embeddings_And_Reranking |
| Principle | Principle:Togethercomputer_Together_python_Result_Integration |