Heuristic:Explodinggradients Ragas Retry And Backoff Configuration
| Knowledge Sources | |
|---|---|
| Domains | LLM_Evaluation, Optimization, Debugging |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Configuration guide for Ragas retry logic: understanding the multi-layer retry system with RunConfig defaults (10 retries, 60s max wait, exponential backoff) and provider-specific exception narrowing.
Description
Ragas implements a multi-layer retry system using the `tenacity` library. The `RunConfig` dataclass defines global defaults for timeout, retries, backoff, and concurrency. By default, all exceptions trigger retries (`exception_types=(Exception,)`), but for OpenAI-based LLMs, this is automatically narrowed to `RateLimitError` only. On top of the tenacity retries, prompt output parsing has its own retry layer (3 retries with LLM-based output fixing), and NVIDIA/collection metrics have per-judge retry loops (5 retries each).
Usage
Apply this heuristic when you are configuring Ragas for production use or debugging slow or failing evaluations. The default retry settings are tuned for interactive use but may cause excessive delays in CI/CD pipelines or high-volume evaluation runs. Understanding the retry layers helps diagnose why evaluations take longer than expected or why API costs are higher than anticipated.
The Insight (Rule of Thumb)
- Action: Customize `RunConfig` for your use case instead of relying on defaults.
- Value: For CI/CD, reduce `max_retries` to 3 and `timeout` to 60. For rate-limited APIs, increase `max_wait` to 120.
- Trade-off: Fewer retries = faster failure but more NaN results. More retries = more robust but slower and costlier.
- Critical: For OpenAI, the system auto-narrows retries to `RateLimitError` only. For other providers, the default `(Exception,)` retries on all errors including auth failures -- consider narrowing `exception_types`.
- Retry layers stack: A single metric evaluation can trigger: (1) 3 prompt retries x (2) 1 output parser fix retry x (3) 10 tenacity API retries = up to 60+ API calls in the worst case.
Reasoning
The default `max_retries=10` with `max_wait=60` means a single failing operation can take up to ~10 minutes before giving up (exponential backoff sums). The blanket `(Exception,)` default means transient errors AND permanent errors (like invalid API keys) are retried identically, wasting time. The OpenAI-specific narrowing to `RateLimitError` is the most important optimization -- it only retries rate limits, letting auth errors fail immediately.
The prompt output parsing retry layer adds another dimension: when an LLM returns malformed JSON, Ragas makes an additional LLM call (`fix_output_format_prompt`) asking the LLM to fix its own output. This can multiply API costs by 2-4x per prompt evaluation.
Code Evidence
RunConfig defaults from `src/ragas/run_config.py:51-60`:
timeout: int = 180
max_retries: int = 10
max_wait: int = 60
max_workers: int = 16
exception_types: t.Union[
t.Type[BaseException],
t.Tuple[t.Type[BaseException], ...],
] = (Exception,)
seed: int = 42
Tenacity retry configuration from `src/ragas/run_config.py:87-94`:
r = Retrying(
wait=wait_random_exponential(multiplier=1, max=run_config.max_wait),
stop=stop_after_attempt(run_config.max_retries),
retry=retry_if_exception_type(run_config.exception_types),
reraise=True,
after=tenacity_logger,
)
OpenAI exception narrowing from `src/ragas/llms/base.py:336-346`:
def set_run_config(self, run_config: RunConfig):
self.run_config = run_config
if isinstance(self.langchain_llm, BaseOpenAI) or isinstance(self.langchain_llm, ChatOpenAI):
from openai import RateLimitError
self.langchain_llm.request_timeout = run_config.timeout
self.run_config.exception_types = RateLimitError
Prompt output parsing retry from `src/ragas/prompt/pydantic_prompt.py:136-143`:
async def generate(
self,
llm: t.Union[BaseRagasLLM, InstructorBaseRagasLLM, BaseLanguageModel],
data: InputModel,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: t.Optional[Callbacks] = None,
retries_left: int = 3,
) -> OutputModel: