Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Explodinggradients Ragas Concurrency And Rate Limiting

From Leeroopedia



Knowledge Sources
Domains LLM_Evaluation, Optimization, Infrastructure
Last Updated 2026-02-10 12:00 GMT

Overview

Concurrency tuning guide for Ragas: semaphore-based throttling (default 16 workers), uvloop incompatibility, and Jupyter notebook thread-based workaround.

Description

Ragas uses `asyncio.Semaphore` to control how many metric evaluations run in parallel. The default `max_workers=16` provides a reasonable balance between throughput and API pressure. When `max_workers=-1`, all tasks start immediately with no throttling. The concurrency system has two important edge cases: uvloop (used by FastAPI/Starlette) is incompatible with `nest_asyncio`, and Jupyter notebooks use a thread-based workaround to run async code from synchronous contexts.

Usage

Apply this heuristic when evaluating large datasets (100+ samples) or when hitting API rate limits (429 errors). Reducing `max_workers` prevents rate limiting; increasing it speeds up evaluation on high-throughput APIs. Also critical when deploying Ragas in production web servers using uvloop.

The Insight (Rule of Thumb)

  • Action: Set `RunConfig(max_workers=N)` based on your API rate limit.
  • Value: Default is 16. For rate-limited free-tier APIs, use 2-4. For high-throughput enterprise APIs, use 32-64.
  • Trade-off: Lower concurrency = slower but no 429 errors. Higher concurrency = faster but risks rate limiting.
  • uvloop warning: Applications using uvloop (FastAPI, Starlette, Sanic) cannot use sync `evaluate()`. They must use `aevaluate()` or the `@experiment` decorator in a proper async context.
  • Jupyter: Sync calls in Jupyter create a new thread per invocation. For high-throughput Jupyter usage, prefer async methods (`agenerate()`, `aevaluate()`).

Reasoning

The semaphore approach creates all tasks upfront but only allows `max_workers` to be actively running at any time. This is efficient because task creation is cheap (just coroutine objects), and the semaphore naturally provides backpressure. When `max_workers=-1`, there is no throttling -- which can overwhelm APIs with hundreds of concurrent requests on large datasets.

The uvloop incompatibility exists because `nest_asyncio` monkey-patches CPython's event loop internals, and uvloop uses a completely different C-based implementation. When uvloop is detected, Ragas skips the `nest_asyncio.apply()` call and raises a `RuntimeError` if nested async execution is attempted.

Code Evidence

Semaphore-based concurrency from `src/ragas/async_utils.py:58-79`:

def as_completed(
    coroutines: t.Sequence[t.Coroutine],
    max_workers: int = -1,
    *,
    cancel_check: t.Optional[t.Callable[[], bool]] = None,
    cancel_pending: bool = True,
) -> t.Iterator[asyncio.Future]:
    if max_workers == -1:
        tasks = [asyncio.create_task(coro) for coro in coroutines]
    else:
        semaphore = asyncio.Semaphore(max_workers)
        async def sema_coro(coro):
            async with semaphore:
                return await coro
        tasks = [asyncio.create_task(sema_coro(coro)) for coro in coroutines]

uvloop detection and skip from `src/ragas/async_utils.py:39-55`:

loop = asyncio.get_running_loop()
loop_type = type(loop).__name__

if "uvloop" in loop_type.lower() or "uvloop" in str(type(loop)):
    logger.debug(
        f"Skipping nest_asyncio.apply() for incompatible loop type: {loop_type}"
    )
    return False

RuntimeError for nested async with uvloop from `src/ragas/async_utils.py:146-154`:

if is_event_loop_running() and not nest_asyncio_applied:
    loop = asyncio.get_running_loop()
    loop_type = type(loop).__name__
    raise RuntimeError(
        f"Cannot execute nested async code with {loop_type}. "
        f"uvloop does not support nested event loop execution. "
        f"Please use asyncio's standard event loop in Jupyter environments, "
        f"or refactor your code to avoid nested async calls."
    )

max_workers default from `src/ragas/run_config.py:54`:

max_workers: int = 16

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment