Heuristic:Promptfoo Promptfoo Retry With Jitter

Knowledge Sources	Promptfoo AWS Architecture Blog on Jitter
Domains	Optimization, Error_Handling
Last Updated	2026-02-14 08:00 GMT

Overview

Exponential backoff retry strategy with 20% jitter factor, 3 max retries, and 1-60 second delay range to prevent thundering herd problems when multiple clients hit rate limits simultaneously.

Description

Promptfoo implements a retry policy for failed API requests that uses exponential backoff with randomized jitter. The jitter prevents synchronized retry storms that occur when multiple clients back off for the exact same duration and then all retry simultaneously. The policy also respects server-specified `Retry-After` headers when available, adding jitter to those as well.

Importantly, the retry logic distinguishes between transient errors (worth retrying) and permanent errors (will never succeed). This classification is critical to avoid wasting time on misconfiguration errors.

Usage

This heuristic applies to all LLM API calls made through promptfoo providers. It is the default behavior and requires no configuration. Use `PROMPTFOO_RETRY_5XX=true` to also retry server errors. Use `PROMPTFOO_REQUEST_BACKOFF_MS` to set a custom base delay.

The Insight (Rule of Thumb)

Action: Use exponential backoff with jitter for all retryable API failures.
Value: `maxRetries: 3`, `baseDelayMs: 1000`, `maxDelayMs: 60000`, `jitterFactor: 0.2`
Trade-off: 20% jitter adds up to ~12 seconds of extra delay in worst case (60s * 0.2), but prevents cascading retry storms.
Formula: `delay = min(baseDelay * 2^attempt + (baseDelay * 2^attempt * 0.2 * random()), 60000)`
Server Retry-After: When present, used as base delay (with jitter added). `Retry-After: 0` means immediate retry.

Reasoning

Without jitter, N clients hitting a rate limit at the same time will all retry at exactly `baseDelay * 2^attempt` milliseconds, creating a perfectly synchronized spike. With 20% jitter:

Attempt 0: 1000ms + (0-200ms jitter) = 1.0-1.2s
Attempt 1: 2000ms + (0-400ms jitter) = 2.0-2.4s
Attempt 2: 4000ms + (0-800ms jitter) = 4.0-4.8s

The 20% factor is chosen as a balance: enough randomness to spread retries over a meaningful window, but not so much as to cause unpredictable delays.

From `src/scheduler/retryPolicy.ts:10-15`:

export const DEFAULT_RETRY_POLICY: RetryPolicy = {
  maxRetries: 3,
  baseDelayMs: 1000,
  maxDelayMs: 60000,
  jitterFactor: 0.2,
};

Delay calculation from `src/scheduler/retryPolicy.ts:36-39`:

// Exponential backoff with jitter
const exponentialDelay = policy.baseDelayMs * Math.pow(2, attempt);
const jitter = exponentialDelay * policy.jitterFactor * Math.random();
return Math.min(exponentialDelay + jitter, policy.maxDelayMs);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment