Heuristic:Confident ai Deepeval Async Concurrency Tuning

Knowledge Sources	deepeval Internal
Domains	Optimization, LLM_Evaluation
Last Updated	2026-02-14 10:00 GMT

Overview

Tuning guide for async evaluation concurrency settings to balance throughput against LLM provider rate limits and resource consumption.

Description

Deepeval runs metric evaluations asynchronously by default using `nest_asyncio` to allow nested event loops. The `AsyncConfig` dataclass controls concurrency with `max_concurrent` (default 20) and `throttle_value` (default 0) parameters. The `evaluate()` function defaults to `run_async=True` with 20 max concurrent tasks, while `assert_test()` uses 100 max concurrent. Understanding these defaults and how to tune them prevents rate limiting, memory issues, and premature timeouts during large-scale evaluations.

Usage

Use this heuristic when you encounter rate limit errors from LLM providers during batch evaluation, when memory usage spikes with many concurrent metric evaluations, or when you want to maximize throughput for a provider with generous rate limits.

The Insight (Rule of Thumb)

Action: Pass a custom `AsyncConfig` to `evaluate()` to control concurrency.
Default Values:
- `max_concurrent = 20` for `evaluate()` (20 test cases evaluated concurrently)
- `max_concurrent = 100` for `assert_test()` (single test, all metrics concurrently)
- `throttle_value = 0` (no delay between task dispatches)
- `run_async = True` (async mode on by default)
- `DEEPEVAL_MAX_CONCURRENT_DOC_PROCESSING = 2` (for synthesizer document pipelines)
- `DEEPEVAL_TIMEOUT_THREAD_LIMIT = 128` (thread pool for timeout enforcement)
Trade-off: Lower `max_concurrent` reduces rate limit errors but increases total evaluation time. Higher `throttle_value` adds delay between dispatches but smooths API load.
Tip: For OpenAI with default rate limits, `max_concurrent=10` with `throttle_value=1` is a safe starting point. For batch evaluations with 100+ test cases, reduce `max_concurrent` to 5-10.

Reasoning

The evaluation pipeline dispatches all test case/metric combinations concurrently up to `max_concurrent`. Each metric evaluation may make 1-3 LLM calls (generation, self-reflection, etc.). With 50 test cases and 3 metrics at `max_concurrent=20`, up to 20 simultaneous LLM API calls happen at once.

Code evidence from `deepeval/evaluate/configs.py:8-17`:

@dataclass
class AsyncConfig:
    run_async: bool = True
    throttle_value: float = 0
    max_concurrent: int = 20

    def __post_init__(self):
        if self.max_concurrent < 1:
            raise ValueError("'max_concurrent' must be at least 1")
        if self.throttle_value < 0:
            raise ValueError("'throttle_value' must be at least 0")

assert_test uses higher concurrency from `deepeval/evaluate/evaluate.py:92`:

async_config = AsyncConfig(throttle_value=0, max_concurrent=100)

evaluate defaults from `deepeval/evaluate/evaluate.py:199`:

async_config: Optional[AsyncConfig] = AsyncConfig(),  # max_concurrent=20

nest_asyncio for nested event loops from `deepeval/utils.py:194-205`:

def get_or_create_event_loop() -> asyncio.AbstractEventLoop:
    try:
        loop = asyncio.get_event_loop()
        if loop.is_running():
            nest_asyncio.apply()
        if loop.is_closed():
            raise RuntimeError
    except RuntimeError:
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    return loop

Document processing concurrency cap from `deepeval/config/settings.py:847-848`:

DEEPEVAL_MAX_CONCURRENT_DOC_PROCESSING: conint(ge=1) = Field(
    2, description="Max concurrent async document processing tasks."
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment