Heuristic:Explodinggradients Ragas Retry And Backoff Configuration

Knowledge Sources	Ragas Internal
Domains	LLM_Evaluation, Optimization, Debugging
Last Updated	2026-02-10 12:00 GMT

Overview

Configuration guide for Ragas retry logic: understanding the multi-layer retry system with RunConfig defaults (10 retries, 60s max wait, exponential backoff) and provider-specific exception narrowing.

Description

Ragas implements a multi-layer retry system using the `tenacity` library. The `RunConfig` dataclass defines global defaults for timeout, retries, backoff, and concurrency. By default, all exceptions trigger retries (`exception_types=(Exception,)`), but for OpenAI-based LLMs, this is automatically narrowed to `RateLimitError` only. On top of the tenacity retries, prompt output parsing has its own retry layer (3 retries with LLM-based output fixing), and NVIDIA/collection metrics have per-judge retry loops (5 retries each).

Usage

Apply this heuristic when you are configuring Ragas for production use or debugging slow or failing evaluations. The default retry settings are tuned for interactive use but may cause excessive delays in CI/CD pipelines or high-volume evaluation runs. Understanding the retry layers helps diagnose why evaluations take longer than expected or why API costs are higher than anticipated.

The Insight (Rule of Thumb)

Action: Customize `RunConfig` for your use case instead of relying on defaults.
Value: For CI/CD, reduce `max_retries` to 3 and `timeout` to 60. For rate-limited APIs, increase `max_wait` to 120.
Trade-off: Fewer retries = faster failure but more NaN results. More retries = more robust but slower and costlier.
Critical: For OpenAI, the system auto-narrows retries to `RateLimitError` only. For other providers, the default `(Exception,)` retries on all errors including auth failures -- consider narrowing `exception_types`.
Retry layers stack: A single metric evaluation can trigger: (1) 3 prompt retries x (2) 1 output parser fix retry x (3) 10 tenacity API retries = up to 60+ API calls in the worst case.

Reasoning

The default `max_retries=10` with `max_wait=60` means a single failing operation can take up to ~10 minutes before giving up (exponential backoff sums). The blanket `(Exception,)` default means transient errors AND permanent errors (like invalid API keys) are retried identically, wasting time. The OpenAI-specific narrowing to `RateLimitError` is the most important optimization -- it only retries rate limits, letting auth errors fail immediately.

The prompt output parsing retry layer adds another dimension: when an LLM returns malformed JSON, Ragas makes an additional LLM call (`fix_output_format_prompt`) asking the LLM to fix its own output. This can multiply API costs by 2-4x per prompt evaluation.

Code Evidence

RunConfig defaults from `src/ragas/run_config.py:51-60`:

timeout: int = 180
max_retries: int = 10
max_wait: int = 60
max_workers: int = 16
exception_types: t.Union[
    t.Type[BaseException],
    t.Tuple[t.Type[BaseException], ...],
] = (Exception,)
seed: int = 42

Tenacity retry configuration from `src/ragas/run_config.py:87-94`:

r = Retrying(
    wait=wait_random_exponential(multiplier=1, max=run_config.max_wait),
    stop=stop_after_attempt(run_config.max_retries),
    retry=retry_if_exception_type(run_config.exception_types),
    reraise=True,
    after=tenacity_logger,
)

OpenAI exception narrowing from `src/ragas/llms/base.py:336-346`:

def set_run_config(self, run_config: RunConfig):
    self.run_config = run_config
    if isinstance(self.langchain_llm, BaseOpenAI) or isinstance(self.langchain_llm, ChatOpenAI):
        from openai import RateLimitError
        self.langchain_llm.request_timeout = run_config.timeout
        self.run_config.exception_types = RateLimitError

Prompt output parsing retry from `src/ragas/prompt/pydantic_prompt.py:136-143`:

async def generate(
    self,
    llm: t.Union[BaseRagasLLM, InstructorBaseRagasLLM, BaseLanguageModel],
    data: InputModel,
    temperature: t.Optional[float] = None,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.Optional[Callbacks] = None,
    retries_left: int = 3,
) -> OutputModel:

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment