Heuristic:Openclaw Openclaw Retry With Exponential Backoff
| Knowledge Sources | |
|---|---|
| Domains | Reliability, Networking |
| Last Updated | 2026-02-06 12:00 GMT |
Overview
Retry strategy using exponential backoff with configurable jitter, defaulting to 3 attempts with 300ms-30s delay range for all external API calls and provider interactions.
Description
OpenClaw implements a centralized retry utility (`retryAsync`) that applies exponential backoff to any asynchronous operation. The default configuration uses 3 attempts with a minimum delay of 300ms, maximum delay of 30 seconds, and no jitter. The delay doubles on each attempt (300ms, 600ms, 1200ms...) capped at the maximum. An optional `retryAfterMs` callback extracts vendor-specific retry headers (e.g., HTTP 429 Retry-After) to honor server-mandated delays. Jitter is applied as a +/- multiplier to prevent thundering herd effects when many clients retry simultaneously.
Usage
Apply this heuristic when implementing any external API call (model providers, channel APIs, webhook delivery). The default 3-attempt configuration is suitable for most operations. Increase attempts for critical operations (e.g., message delivery); decrease for latency-sensitive operations (e.g., health checks).
The Insight (Rule of Thumb)
- Action: Wrap external calls with `retryAsync(fn, { attempts: 3, minDelayMs: 300, maxDelayMs: 30_000 })`.
- Value: 3 attempts, 300ms base delay, 30s cap, exponential growth (2^n).
- Trade-off: Retries add latency (up to ~31s worst case for 3 attempts). For real-time operations, reduce maxDelayMs or attempts.
- Jitter: Enable `jitter: 0.5` when multiple concurrent clients may retry the same endpoint to avoid thundering herd.
- Vendor Hints: Use `retryAfterMs` callback to extract and honor HTTP Retry-After headers (e.g., 429 responses).
Reasoning
Exponential backoff is the industry-standard approach for transient failures. The 300ms base delay avoids overwhelming services during brief outages while keeping recovery fast. The 30s cap prevents excessively long waits. The configurable `shouldRetry` predicate allows skipping retries for permanent errors (4xx status codes) while retrying transient ones (5xx, network errors). The `retryAfterMs` integration ensures compliance with rate-limiting APIs like Microsoft Teams and Telegram.
Code Evidence from `src/infra/retry.ts:25-30`:
const DEFAULT_RETRY_CONFIG = {
attempts: 3,
minDelayMs: 300,
maxDelayMs: 30_000,
jitter: 0,
};
Jitter application from `src/infra/retry.ts:62-68`:
function applyJitter(delayMs: number, jitter: number): number {
if (jitter <= 0) {
return delayMs;
}
const offset = (Math.random() * 2 - 1) * jitter;
return Math.max(0, Math.round(delayMs * (1 + offset)));
}
Retry-After integration from `src/infra/retry.ts:115-119`:
const retryAfterMs = options.retryAfterMs?.(err);
const hasRetryAfter = typeof retryAfterMs === "number" && Number.isFinite(retryAfterMs);
const baseDelay = hasRetryAfter
? Math.max(retryAfterMs, minDelayMs)
: minDelayMs * 2 ** (attempt - 1);