Implementation:PrefectHQ Prefect Fetch HTML Task
Appearance
| Metadata | |
|---|---|
| Sources | Prefect |
| Domains | Web_Scraping |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete task for downloading HTML content from URLs using requests with Prefect retries.
Description
The fetch_html task wraps a requests.get call with @task(retries=3, retry_delay_seconds=2) for resilient HTML downloading. It fetches the raw HTML text from a given URL with a 10-second timeout.
Code Reference
- Repository: https://github.com/PrefectHQ/prefect
- File: examples/simple_web_scraper.py (L43-52)
- Signature:
@task(retries=3, retry_delay_seconds=2)
def fetch_html(url: str) -> str:
"""Download page HTML (with retries)."""
print(f"Fetching {url} …")
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.text
- Import: from prefect import task; import requests
I/O Contract
Inputs
- url (str, required) — URL to download
Outputs
- str — Raw HTML text of the page
Usage Example
from prefect import flow, task
import requests
@task(retries=3, retry_delay_seconds=2)
def fetch_html(url: str) -> str:
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.text
@flow(log_prints=True)
def scrape(urls: list[str] | None = None) -> None:
if urls:
for url in urls:
html = fetch_html(url)
content = parse_article(html)
print(content if content else "No article content found.")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment