Heuristic:Explodinggradients Ragas Concurrency And Rate Limiting
| Knowledge Sources | |
|---|---|
| Domains | LLM_Evaluation, Optimization, Infrastructure |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Concurrency tuning guide for Ragas: semaphore-based throttling (default 16 workers), uvloop incompatibility, and Jupyter notebook thread-based workaround.
Description
Ragas uses `asyncio.Semaphore` to control how many metric evaluations run in parallel. The default `max_workers=16` provides a reasonable balance between throughput and API pressure. When `max_workers=-1`, all tasks start immediately with no throttling. The concurrency system has two important edge cases: uvloop (used by FastAPI/Starlette) is incompatible with `nest_asyncio`, and Jupyter notebooks use a thread-based workaround to run async code from synchronous contexts.
Usage
Apply this heuristic when evaluating large datasets (100+ samples) or when hitting API rate limits (429 errors). Reducing `max_workers` prevents rate limiting; increasing it speeds up evaluation on high-throughput APIs. Also critical when deploying Ragas in production web servers using uvloop.
The Insight (Rule of Thumb)
- Action: Set `RunConfig(max_workers=N)` based on your API rate limit.
- Value: Default is 16. For rate-limited free-tier APIs, use 2-4. For high-throughput enterprise APIs, use 32-64.
- Trade-off: Lower concurrency = slower but no 429 errors. Higher concurrency = faster but risks rate limiting.
- uvloop warning: Applications using uvloop (FastAPI, Starlette, Sanic) cannot use sync `evaluate()`. They must use `aevaluate()` or the `@experiment` decorator in a proper async context.
- Jupyter: Sync calls in Jupyter create a new thread per invocation. For high-throughput Jupyter usage, prefer async methods (`agenerate()`, `aevaluate()`).
Reasoning
The semaphore approach creates all tasks upfront but only allows `max_workers` to be actively running at any time. This is efficient because task creation is cheap (just coroutine objects), and the semaphore naturally provides backpressure. When `max_workers=-1`, there is no throttling -- which can overwhelm APIs with hundreds of concurrent requests on large datasets.
The uvloop incompatibility exists because `nest_asyncio` monkey-patches CPython's event loop internals, and uvloop uses a completely different C-based implementation. When uvloop is detected, Ragas skips the `nest_asyncio.apply()` call and raises a `RuntimeError` if nested async execution is attempted.
Code Evidence
Semaphore-based concurrency from `src/ragas/async_utils.py:58-79`:
def as_completed(
coroutines: t.Sequence[t.Coroutine],
max_workers: int = -1,
*,
cancel_check: t.Optional[t.Callable[[], bool]] = None,
cancel_pending: bool = True,
) -> t.Iterator[asyncio.Future]:
if max_workers == -1:
tasks = [asyncio.create_task(coro) for coro in coroutines]
else:
semaphore = asyncio.Semaphore(max_workers)
async def sema_coro(coro):
async with semaphore:
return await coro
tasks = [asyncio.create_task(sema_coro(coro)) for coro in coroutines]
uvloop detection and skip from `src/ragas/async_utils.py:39-55`:
loop = asyncio.get_running_loop()
loop_type = type(loop).__name__
if "uvloop" in loop_type.lower() or "uvloop" in str(type(loop)):
logger.debug(
f"Skipping nest_asyncio.apply() for incompatible loop type: {loop_type}"
)
return False
RuntimeError for nested async with uvloop from `src/ragas/async_utils.py:146-154`:
if is_event_loop_running() and not nest_asyncio_applied:
loop = asyncio.get_running_loop()
loop_type = type(loop).__name__
raise RuntimeError(
f"Cannot execute nested async code with {loop_type}. "
f"uvloop does not support nested event loop execution. "
f"Please use asyncio's standard event loop in Jupyter environments, "
f"or refactor your code to avoid nested async calls."
)
max_workers default from `src/ragas/run_config.py:54`:
max_workers: int = 16