Heuristic:Confident ai Deepeval Async Concurrency Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLM_Evaluation |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Tuning guide for async evaluation concurrency settings to balance throughput against LLM provider rate limits and resource consumption.
Description
Deepeval runs metric evaluations asynchronously by default using `nest_asyncio` to allow nested event loops. The `AsyncConfig` dataclass controls concurrency with `max_concurrent` (default 20) and `throttle_value` (default 0) parameters. The `evaluate()` function defaults to `run_async=True` with 20 max concurrent tasks, while `assert_test()` uses 100 max concurrent. Understanding these defaults and how to tune them prevents rate limiting, memory issues, and premature timeouts during large-scale evaluations.
Usage
Use this heuristic when you encounter rate limit errors from LLM providers during batch evaluation, when memory usage spikes with many concurrent metric evaluations, or when you want to maximize throughput for a provider with generous rate limits.
The Insight (Rule of Thumb)
- Action: Pass a custom `AsyncConfig` to `evaluate()` to control concurrency.
- Default Values:
- `max_concurrent = 20` for `evaluate()` (20 test cases evaluated concurrently)
- `max_concurrent = 100` for `assert_test()` (single test, all metrics concurrently)
- `throttle_value = 0` (no delay between task dispatches)
- `run_async = True` (async mode on by default)
- `DEEPEVAL_MAX_CONCURRENT_DOC_PROCESSING = 2` (for synthesizer document pipelines)
- `DEEPEVAL_TIMEOUT_THREAD_LIMIT = 128` (thread pool for timeout enforcement)
- Trade-off: Lower `max_concurrent` reduces rate limit errors but increases total evaluation time. Higher `throttle_value` adds delay between dispatches but smooths API load.
- Tip: For OpenAI with default rate limits, `max_concurrent=10` with `throttle_value=1` is a safe starting point. For batch evaluations with 100+ test cases, reduce `max_concurrent` to 5-10.
Reasoning
The evaluation pipeline dispatches all test case/metric combinations concurrently up to `max_concurrent`. Each metric evaluation may make 1-3 LLM calls (generation, self-reflection, etc.). With 50 test cases and 3 metrics at `max_concurrent=20`, up to 20 simultaneous LLM API calls happen at once.
Code evidence from `deepeval/evaluate/configs.py:8-17`:
@dataclass
class AsyncConfig:
run_async: bool = True
throttle_value: float = 0
max_concurrent: int = 20
def __post_init__(self):
if self.max_concurrent < 1:
raise ValueError("'max_concurrent' must be at least 1")
if self.throttle_value < 0:
raise ValueError("'throttle_value' must be at least 0")
assert_test uses higher concurrency from `deepeval/evaluate/evaluate.py:92`:
async_config = AsyncConfig(throttle_value=0, max_concurrent=100)
evaluate defaults from `deepeval/evaluate/evaluate.py:199`:
async_config: Optional[AsyncConfig] = AsyncConfig(), # max_concurrent=20
nest_asyncio for nested event loops from `deepeval/utils.py:194-205`:
def get_or_create_event_loop() -> asyncio.AbstractEventLoop:
try:
loop = asyncio.get_event_loop()
if loop.is_running():
nest_asyncio.apply()
if loop.is_closed():
raise RuntimeError
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
return loop
Document processing concurrency cap from `deepeval/config/settings.py:847-848`:
DEEPEVAL_MAX_CONCURRENT_DOC_PROCESSING: conint(ge=1) = Field(
2, description="Max concurrent async document processing tasks."
)