Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:PrefectHQ Prefect Fetch Page Task

From Leeroopedia


Metadata
Sources
Domains
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete task for fetching paginated JSON data from REST APIs provided by the Prefect ETL example. The fetch_page task wraps an httpx.get call with Prefect's retry mechanism for resilient HTTP data extraction.

Description

The fetch_page task wraps an httpx.get call with Prefect's @task(retries=3, retry_delay_seconds=[2, 5, 15]) for resilient HTTP data extraction. It fetches a single page of results from a REST API and returns the parsed JSON response.

Key characteristics:

  • 3 retries with escalating backoff (2s, 5s, 15s) for transient HTTP failures
  • 30-second timeout on the HTTP request itself via httpx
  • raise_for_status() to convert HTTP error codes (4xx, 5xx) into exceptions that trigger retries
  • Independent execution -- each page fetch is a separate task run, so failures are isolated

Code Reference

Signature:

@task(retries=3, retry_delay_seconds=[2, 5, 15])
def fetch_page(page: int, api_base: str, per_page: int) -> list[dict[str, Any]]:
    """Return a list of article dicts for a given page number."""
    url = f"{api_base}/articles"
    params = {"page": page, "per_page": per_page}
    print(f"Fetching page {page} …")
    response = httpx.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()

I/O Contract

Direction Parameter Type Description
Input page int, required The page number to fetch.
Input api_base str, required The base URL of the REST API (e.g., https://dev.to/api).
Input per_page int, required Number of results to request per page.
Output -- list[dict[str, Any]] List of article dictionaries parsed from the JSON API response.

Error behavior:

Scenario Behavior
HTTP 4xx/5xx response raise_for_status() raises httpx.HTTPStatusError, triggering a retry
Network timeout httpx raises httpx.TimeoutException after 30 seconds, triggering a retry
Connection error httpx raises httpx.ConnectError, triggering a retry
All retries exhausted The task transitions to Failed state and the exception propagates to the parent flow

Usage Example

from prefect import flow, task
import httpx

@task(retries=3, retry_delay_seconds=[2, 5, 15])
def fetch_page(page: int, api_base: str, per_page: int) -> list[dict]:
    url = f"{api_base}/articles"
    params = {"page": page, "per_page": per_page}
    response = httpx.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()

@flow(name="devto_etl", log_prints=True)
def etl(api_base: str, pages: int, per_page: int, output_file: Path) -> None:
    raw_pages = []
    for page_number in range(1, pages + 1):
        raw_pages.append(fetch_page(page_number, api_base, per_page))
    # ... transform and load

In this example, the flow iterates through the requested number of pages. Each call to fetch_page creates an independent task run. If page 3 fails due to a network timeout, Prefect automatically retries it (waiting 2s, then 5s, then 15s) without re-fetching pages 1 and 2.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment