Heuristic:Vibrantlabsai Ragas Concurrency And Retry Configuration
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Infrastructure |
| Last Updated | 2026-02-12 10:00 GMT |
Overview
Default configuration heuristic for Ragas concurrent execution: 16 workers, 10 retries with exponential backoff (max 60s wait), 180-second timeout, and seed 42 for reproducibility.
Description
The `RunConfig` dataclass defines the runtime behavior for all Ragas LLM operations. It controls concurrency (how many API calls run simultaneously), retry logic (how failures are handled), timeouts, and random seed. The defaults are tuned for typical cloud LLM API rate limits. Tenacity library provides the retry mechanism with randomized exponential backoff to avoid thundering herd problems.
Usage
Use this heuristic when:
- Hitting rate limits: Reduce `max_workers` below 16 to lower concurrent API calls.
- Experiencing timeouts: Increase `timeout` beyond 180s for slow models or large prompts.
- Needing faster evaluation: Increase `max_workers` if your API quota allows (set to -1 for unlimited).
- Debugging flaky tests: Reduce `max_retries` to fail fast, or increase for unreliable endpoints.
- Requiring reproducibility: The `seed=42` default initializes a NumPy random generator for consistent behavior.
The Insight (Rule of Thumb)
- Action: Instantiate `RunConfig` with custom values and pass to `evaluate()` or individual metrics.
- Values:
- `max_workers=16` — Safe default for most API rate limits
- `max_retries=10` — Handles transient failures without excessive delay
- `max_wait=60` — Caps exponential backoff to prevent extremely long waits
- `timeout=180` — 3 minutes per operation is generous for most LLM calls
- `seed=42` — Standard reproducibility seed
- `exception_types=(Exception,)` — Catches all exceptions for retry (broad by design)
- Trade-off: Higher workers = faster but risks rate limiting. More retries = more resilient but slower failure reporting. Setting `max_workers=-1` means unlimited concurrency (uses asyncio.Semaphore bypass).
Reasoning
Cloud LLM APIs (OpenAI, Anthropic, etc.) typically have rate limits of 10-60 requests per minute for standard tiers. The default of 16 concurrent workers provides good throughput without immediately hitting most rate limits. The exponential backoff with randomized jitter (`wait_random_exponential(multiplier=1, max=60)`) prevents synchronized retry storms. The broad exception catching `(Exception,)` ensures transient network errors, timeouts, and rate limit responses all trigger retries. The 42 seed follows the convention of Douglas Adams and ensures evaluation scores are reproducible when running the same dataset.
Code Evidence
RunConfig defaults from `src/ragas/run_config.py:51-60`:
timeout: int = 180
max_retries: int = 10
max_wait: int = 60
max_workers: int = 16
exception_types: t.Union[
t.Type[BaseException],
t.Tuple[t.Type[BaseException], ...],
] = (Exception,)
log_tenacity: bool = False
seed: int = 42
Retry with exponential backoff from `src/ragas/run_config.py:87-94`:
r = Retrying(
wait=wait_random_exponential(multiplier=1, max=run_config.max_wait),
stop=stop_after_attempt(run_config.max_retries),
retry=retry_if_exception_type(run_config.exception_types),
reraise=True,
after=tenacity_logger,
)
Semaphore-based concurrency control from `src/ragas/async_utils.py:70-79`:
if max_workers == -1:
tasks = [asyncio.create_task(coro) for coro in coroutines]
else:
semaphore = asyncio.Semaphore(max_workers)
async def sema_coro(coro):
async with semaphore:
return await coro
tasks = [asyncio.create_task(sema_coro(coro)) for coro in coroutines]