Heuristic:Ucbepic Docetl Rate Limit Exponential Backoff
| Knowledge Sources | |
|---|---|
| Domains | LLM_Pipelines, Optimization, Debugging |
| Last Updated | 2026-02-08 01:00 GMT |
Overview
Exponential backoff strategy (4s base, 120s cap) for handling LLM API rate limits, with separate 1-second fixed retry for connection errors.
Description
DocETL handles three types of API failures differently:
- Rate Limits (429): Exponential backoff starting at 4 seconds, doubling each attempt, capped at 120 seconds. This gives the provider time to reset quotas.
- Connection Errors: Fixed 1-second retry, as these are typically transient network issues.
- Service Unavailable (503): Fixed 1-second retry, typically indicates temporary server overload.
The rate limit handler is acknowledged in the codebase as a "hacky" solution (TODO comment), but it is effective in practice for most LLM providers.
Usage
Use this heuristic when debugging slow pipeline execution or encountering rate limit errors. If your pipeline is spending excessive time in backoff, consider reducing `max_threads` in your pipeline config or configuring explicit rate limits via `rate_limits` in the YAML config.
The Insight (Rule of Thumb)
- Action: Rate limit retries use exponential backoff with formula `4 * (2 ^ attempt)`, capped at 120 seconds.
- Value:
- Attempt 0: 4 seconds
- Attempt 1: 8 seconds
- Attempt 2: 16 seconds
- Attempt 3: 32 seconds
- Attempt 4: 64 seconds
- Attempt 5+: 120 seconds (cap)
- Trade-off: Aggressive enough to recover quickly from brief rate limits, but the 120s cap prevents indefinite waiting on sustained rate limits.
- Parallel limit: The `max_threads` parameter (default: `os.cpu_count() * 4`) controls how many concurrent LLM calls are made. Reducing this is the primary lever for avoiding rate limits.
Reasoning
LLM API providers enforce rate limits to manage server load. The exponential backoff pattern is industry-standard because:
- Short initial wait (4s): Most rate limit windows are short (per-minute quotas), so a brief pause often succeeds on retry.
- Doubling: Progressively longer waits handle sustained rate limiting without flooding the API.
- 120s cap: Prevents pathological waits. If the API is still rate-limited after 2 minutes, the issue is likely quota exhaustion rather than temporary throttling.
- No max retry count: Rate limit retries continue indefinitely. The assumption is that rate limits are always temporary and the pipeline should eventually complete.
Additionally, DocETL provides a `pyrate-limiter` integration for proactive rate limiting, which can prevent hitting provider limits entirely when configured.
Code Evidence
Exponential backoff from `docetl/operations/utils/api.py:629-638`:
except RateLimitError:
# TODO: this is a really hacky way to handle rate limits
# we should implement a more robust retry mechanism
backoff_time = 4 * (2**rate_limited_attempt) # Exponential backoff
max_backoff = 120 # Maximum backoff time of 60 seconds
sleep_time = min(backoff_time, max_backoff)
self.runner.console.log(
f"[yellow]Rate limit hit. Retrying in {sleep_time:.2f} seconds...[/yellow]"
)
time.sleep(sleep_time)
rate_limited_attempt += 1
Connection error retry from `docetl/operations/utils/api.py:640-644`:
except APIConnectionError as e:
self.runner.console.log(
f"[bold red]API connection error. Retrying...[/bold red] {e}"
)
time.sleep(1)
Max threads default from `docetl/config_wrapper.py:54`:
self.max_threads = max_threads or (os.cpu_count() or 1) * 4