Heuristic:PrefectHQ Prefect Retry Backoff Strategy

Knowledge Sources	PrefectHQ/prefect Prefect Examples
Domains	Reliability, Optimization
Last Updated	2026-02-09 22:00 GMT

Overview

Prefect task retry delays are capped at 50 entries maximum; use exponential backoff with `[2, 5, 15]` for HTTP tasks or `[1, 2, 4]` for LLM API calls.

Description

Prefect tasks support configurable retry delays via the `retry_delay_seconds` parameter. The framework enforces a hard limit of 50 retry delay entries to prevent memory issues from exponential growth. The codebase examples demonstrate two distinct retry patterns: fixed delays for simple HTTP operations and exponential backoff for LLM API calls. The `exponential_backoff` utility generates power-of-2 delays from a base factor but is capped at 50 entries regardless of the configured `retries` count.

Usage

Apply this heuristic when configuring retry behavior for Prefect tasks. Use shorter fixed delays for fast operations (web scraping, file I/O) and exponential backoff for rate-limited external APIs (LLM providers, cloud services). Be aware of the 50-retry cap when using `exponential_backoff()`.

The Insight (Rule of Thumb)

Action: Set `retry_delay_seconds` explicitly rather than relying on defaults. Match the delay pattern to the failure type.
Value:
- HTTP/API extraction: `retries=3, retry_delay_seconds=[2, 5, 15]`
- Web scraping: `retries=3, retry_delay_seconds=2` (fixed)
- LLM API calls: `retries=3, retry_delay_seconds=[1.0, 2.0, 4.0]`
- LLM tool calls: `retries=2, retry_delay_seconds=[0.5, 1.0]`
- Human approval timeouts: `timeout=3600` seconds (1 hour)
Trade-off: More retries increase resilience but delay failure detection. The 50-retry cap prevents runaway delay list generation.
Hard limit: Maximum 50 retry delays per task, enforced in `tasks.py`.

Reasoning

The retry delay patterns in the Prefect examples encode learned behavior about different failure modes:

HTTP APIs (Dev.to, etc.) have rate limits and transient errors that resolve within seconds. The `[2, 5, 15]` pattern gives the API time to recover while not waiting excessively.
LLM APIs (OpenAI, Anthropic) may have longer recovery times due to rate limiting or model loading. The `[1, 2, 4]` exponential pattern is appropriate.
Tool calls within AI agents use shorter delays `[0.5, 1.0]` because the tool itself (data processing) is unlikely to have transient infrastructure issues.

The 50-retry hard cap exists because `exponential_backoff` generates `2^n` delays. At `n=50`, the delay would be `2^50 ≈ 1.1 quadrillion` seconds, which is both useless and memory-wasteful to store.

Code evidence from `src/prefect/tasks.py:203-207`:

def retry_backoff_callable(retries: int) -> list[float]:
    # no more than 50 retry delays can be configured on a task
    retries = min(retries, 50)
    return [backoff_factor * max(0, 2**r) for r in range(retries)]

Example usage from `examples/run_api_sourced_etl.py:40-44`:

@task(retries=3, retry_delay_seconds=[2, 5, 15])
def fetch_page(url: str, params: dict[str, Any]) -> list[dict[str, Any]]:
    response = httpx.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment