Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:BerriAI Litellm Rerank API

From Leeroopedia
Property Value
sources litellm/rerank_api/main.py
domains Reranking, Information Retrieval, Search
last_updated 2026-02-15 16:00 GMT

Overview

The Rerank API module provides a unified interface for reranking document lists by relevance to a query, supporting over 12 LLM providers including Cohere, Together AI, Azure AI, Bedrock, Jina AI, and others.

Description

This module implements the reranking API through a single rerank/arerank function pair decorated with @client. The sync function contains all the core logic: it resolves the provider via litellm.get_llm_provider(), loads the provider-specific BaseRerankConfig, maps optional parameters, and dispatches to the appropriate handler. Most providers use base_llm_http_handler.rerank(), while Together AI and Bedrock have dedicated handler classes (TogetherAIRerank and BedrockRerankHandler). Provider-specific API key and base URL resolution is handled inline with fallback to environment variables. The module supports configurable parameters including top_n, rank_fields, return_documents, max_chunks_per_doc, and max_tokens_per_doc.

Usage

Import this module when you need to reorder a set of documents by their relevance to a search query. It is typically used in RAG (Retrieval-Augmented Generation) pipelines between the retrieval and generation stages.

Code Reference

Source Location

Property Value
Repository github.com/BerriAI/litellm
File litellm/rerank_api/main.py
Lines 535
Module litellm.rerank_api.main

Signature

@client
def rerank(
    model: str,
    query: str,
    documents: List[Union[str, Dict[str, Any]]],
    custom_llm_provider: Optional[Literal[
        "cohere", "together_ai", "azure_ai", "infinity",
        "litellm_proxy", "hosted_vllm", "deepinfra",
        "fireworks_ai", "voyage",
    ]] = None,
    top_n: Optional[int] = None,
    rank_fields: Optional[List[str]] = None,
    return_documents: Optional[bool] = True,
    max_chunks_per_doc: Optional[int] = None,
    max_tokens_per_doc: Optional[int] = None,
    **kwargs,
) -> Union[RerankResponse, Coroutine[Any, Any, RerankResponse]]

@client
async def arerank(model, query, documents, ...) -> RerankResponse

Import

from litellm.rerank_api.main import rerank, arerank

I/O Contract

Inputs

Parameter Type Required Description
model str Yes The reranking model identifier (e.g., "cohere/rerank-english-v3.0")
query str Yes The search query to rank documents against
documents List[Union[str, Dict]] Yes Documents to rerank (strings or dicts with text fields)
custom_llm_provider Optional[str] No Provider name; auto-detected from model if not set
top_n Optional[int] No Number of top results to return
rank_fields Optional[List[str]] No Fields to use for ranking when documents are dicts
return_documents Optional[bool] No Whether to include document text in results (default: True)
max_chunks_per_doc Optional[int] No Maximum chunks per document
max_tokens_per_doc Optional[int] No Maximum tokens per document

Outputs

Output Type Description
Response RerankResponse Contains ranked results with scores, indices, and optionally documents

Usage Examples

import litellm

response = litellm.rerank(
    model="cohere/rerank-english-v3.0",
    query="What is machine learning?",
    documents=[
        "Machine learning is a subset of artificial intelligence.",
        "The weather today is sunny.",
        "Deep learning uses neural networks.",
    ],
    top_n=2,
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score}")
import asyncio
import litellm

async def main():
    response = await litellm.arerank(
        model="together_ai/rerank-model",
        query="Python programming",
        documents=["Python is a language", "Java is a language", "Python for data science"],
    )
    print(response)

asyncio.run(main())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment