Heuristic:PrefectHQ Prefect Retry Backoff Strategy
| Knowledge Sources | |
|---|---|
| Domains | Reliability, Optimization |
| Last Updated | 2026-02-09 22:00 GMT |
Overview
Prefect task retry delays are capped at 50 entries maximum; use exponential backoff with `[2, 5, 15]` for HTTP tasks or `[1, 2, 4]` for LLM API calls.
Description
Prefect tasks support configurable retry delays via the `retry_delay_seconds` parameter. The framework enforces a hard limit of 50 retry delay entries to prevent memory issues from exponential growth. The codebase examples demonstrate two distinct retry patterns: fixed delays for simple HTTP operations and exponential backoff for LLM API calls. The `exponential_backoff` utility generates power-of-2 delays from a base factor but is capped at 50 entries regardless of the configured `retries` count.
Usage
Apply this heuristic when configuring retry behavior for Prefect tasks. Use shorter fixed delays for fast operations (web scraping, file I/O) and exponential backoff for rate-limited external APIs (LLM providers, cloud services). Be aware of the 50-retry cap when using `exponential_backoff()`.
The Insight (Rule of Thumb)
- Action: Set `retry_delay_seconds` explicitly rather than relying on defaults. Match the delay pattern to the failure type.
- Value:
- HTTP/API extraction: `retries=3, retry_delay_seconds=[2, 5, 15]`
- Web scraping: `retries=3, retry_delay_seconds=2` (fixed)
- LLM API calls: `retries=3, retry_delay_seconds=[1.0, 2.0, 4.0]`
- LLM tool calls: `retries=2, retry_delay_seconds=[0.5, 1.0]`
- Human approval timeouts: `timeout=3600` seconds (1 hour)
- Trade-off: More retries increase resilience but delay failure detection. The 50-retry cap prevents runaway delay list generation.
- Hard limit: Maximum 50 retry delays per task, enforced in `tasks.py`.
Reasoning
The retry delay patterns in the Prefect examples encode learned behavior about different failure modes:
- HTTP APIs (Dev.to, etc.) have rate limits and transient errors that resolve within seconds. The `[2, 5, 15]` pattern gives the API time to recover while not waiting excessively.
- LLM APIs (OpenAI, Anthropic) may have longer recovery times due to rate limiting or model loading. The `[1, 2, 4]` exponential pattern is appropriate.
- Tool calls within AI agents use shorter delays `[0.5, 1.0]` because the tool itself (data processing) is unlikely to have transient infrastructure issues.
The 50-retry hard cap exists because `exponential_backoff` generates `2^n` delays. At `n=50`, the delay would be `2^50 ≈ 1.1 quadrillion` seconds, which is both useless and memory-wasteful to store.
Code evidence from `src/prefect/tasks.py:203-207`:
def retry_backoff_callable(retries: int) -> list[float]:
# no more than 50 retry delays can be configured on a task
retries = min(retries, 50)
return [backoff_factor * max(0, 2**r) for r in range(retries)]
Example usage from `examples/run_api_sourced_etl.py:40-44`:
@task(retries=3, retry_delay_seconds=[2, 5, 15])
def fetch_page(url: str, params: dict[str, Any]) -> list[dict[str, Any]]:
response = httpx.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()