Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Confident ai Deepeval Timeout and Retry Tuning

From Leeroopedia
Knowledge Sources
Domains Optimization, LLM_Evaluation, Reliability
Last Updated 2026-02-14 10:00 GMT

Overview

Configuration strategy for deepeval's layered timeout and retry system to prevent premature task cancellation and optimize LLM provider call reliability.

Description

Deepeval uses a sophisticated three-layer timeout and retry system powered by Tenacity. The system distinguishes between per-attempt timeouts (single LLM call), per-task timeouts (full metric evaluation including retries), and gather timeouts (collecting all async results). These are auto-computed from each other when not explicitly set, with a default outer budget of 180 seconds. Understanding how these interact is critical for tuning evaluation reliability, especially with slow providers or large evaluation batches.

Usage

Use this heuristic when metric evaluations are timing out, when you see warnings about truncated retries, or when evaluating with slow LLM providers (e.g., rate-limited APIs, large models). Also apply when running large batch evaluations where the default 180-second budget may be insufficient.

The Insight (Rule of Thumb)

  • Action: Configure timeout and retry settings via environment variables, setting either `DEEPEVAL_PER_ATTEMPT_TIMEOUT_SECONDS_OVERRIDE` OR `DEEPEVAL_PER_TASK_TIMEOUT_SECONDS_OVERRIDE`, but generally not both.
  • Default Values:
    • Per-task outer budget: 180 seconds (when no overrides set)
    • Per-attempt timeout: auto-derived from outer budget / attempts (accounting for backoff and 1s safety margin)
    • Retry attempts: 2 (default, meaning 1 retry)
    • Initial backoff: 1.0 seconds
    • Exponential base: 2.0
    • Jitter: 2.0 seconds
    • Backoff cap: 5.0 seconds
    • Gather buffer: 15% of per-task timeout (clamped between 10s and 60s)
  • Trade-off: Higher timeouts increase reliability but slow down failure detection. Setting `DEEPEVAL_DISABLE_TIMEOUTS=1` removes DeepEval timeouts entirely but provider SDK timeouts still apply.
  • Tip: Set `DEEPEVAL_PER_TASK_TIMEOUT_SECONDS_OVERRIDE` alone for a simple approach. The per-attempt timeout and gather buffer will auto-derive sensibly.

Reasoning

The auto-derivation logic works as follows:

When only outer override is set:

per_attempt = (outer - expected_backoff - 1s_safety) / attempts

When only per-attempt override is set:

outer = ceil(attempts * per_attempt + expected_backoff + 1s_safety)

When neither is set:

outer = 180s (default)
per_attempt = (180 - backoff - 1) / attempts  (capped at min 1s)

The gather buffer ensures async tasks have time to drain: `buffer = constrain(0.15 * outer, 10, 60)`.

If the outer timeout is too small for the configured attempts and backoff, a warning is logged (visible when `DEEPEVAL_VERBOSE_MODE=1`).

Code evidence from `deepeval/config/settings.py:897-960`:

def _calc_auto_outer_timeout(self) -> float:
    attempts = self.DEEPEVAL_RETRY_MAX_ATTEMPTS or 1
    timeout_seconds = float(self.DEEPEVAL_PER_ATTEMPT_TIMEOUT_SECONDS_OVERRIDE or 0)
    if timeout_seconds <= 0:
        return 180  # default outer budget
    backoff = self._expected_backoff(attempts)
    safety_overhead = 1.0
    return float(math.ceil(attempts * timeout_seconds + backoff + safety_overhead))

Thread limit for timeout enforcement from `deepeval/config/settings.py:854`:

DEEPEVAL_TIMEOUT_THREAD_LIMIT: conint(ge=1) = Field(
    128,
    description="Max worker threads used for timeout enforcement in async execution.",
)

Retry backoff computation from `deepeval/config/settings.py:1525-1538`:

def _expected_backoff(self, attempts: int) -> float:
    sleeps = max(0, attempts - 1)
    cur = float(self.DEEPEVAL_RETRY_INITIAL_SECONDS)
    cap = float(self.DEEPEVAL_RETRY_CAP_SECONDS)
    base = float(self.DEEPEVAL_RETRY_EXP_BASE)
    jitter = float(self.DEEPEVAL_RETRY_JITTER)
    backoff = 0.0
    for _ in range(sleeps):
        backoff += min(cap, cur)
        cur *= base
    backoff += sleeps * (jitter / 2.0)  # expected jitter
    return backoff

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment