Overview
Scrapes individual web pages using the Firecrawl v2 API to extract clean, structured content.
Description
FirecrawlScrapeWebsiteTool extends BaseTool and integrates with FirecrawlApp from the firecrawl-py package. It accepts a single URL and scrapes it with extensive configurable options including content formats (default: markdown), only_main_content filtering, include_tags / exclude_tags, caching via max_age, mobile emulation, skip_tls_verification, block_ads, proxy configuration, and store_in_cache. The tool initializes FirecrawlApp with an API key and handles automatic package installation if firecrawl-py is missing. The _run() method calls scrape() on the FirecrawlApp instance with the configured options.
Usage
Use this tool when a CrewAI agent needs to extract content from a single web page with fine-grained control over scraping behavior, including anti-bot bypass features and content filtering.
Code Reference
Source Location
- Repository: CrewAI
- File:
lib/crewai-tools/src/crewai_tools/tools/firecrawl_scrape_website_tool/firecrawl_scrape_website_tool.py
- Lines: 1-125
Signature
class FirecrawlScrapeWebsiteTool(BaseTool):
name: str = "Firecrawl web scrape tool"
description: str = "Scrape webpages using Firecrawl and return the contents"
args_schema: type[BaseModel] = FirecrawlScrapeWebsiteToolSchema
api_key: str | None = None
config: dict[str, Any] = Field(default_factory=lambda: {...})
def __init__(self, api_key: str | None = None, **kwargs): ...
def _run(self, url: str): ...
Import
from crewai_tools import FirecrawlScrapeWebsiteTool
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| url |
str |
Yes |
Website URL to scrape
|
| api_key |
str |
No |
Firecrawl API key (constructor param; falls back to FIRECRAWL_API_KEY env var)
|
| config |
dict |
No |
Scrape configuration overriding defaults (constructor param)
|
Default Configuration
| Parameter |
Default |
Description
|
| formats |
["markdown"] |
Content formats to return
|
| only_main_content |
True |
Exclude headers, navs, footers
|
| max_age |
172800000 (2 days) |
Cache age limit in milliseconds
|
| mobile |
False |
Emulate mobile device
|
| skip_tls_verification |
True |
Skip TLS certificate verification
|
| block_ads |
True |
Enable ad and cookie popup blocking
|
| proxy |
"auto" |
Proxy type (basic, stealth, auto)
|
| store_in_cache |
True |
Store page in Firecrawl cache
|
Outputs
| Name |
Type |
Description
|
| _run() returns |
Any |
Firecrawl scrape result containing extracted page content
|
Usage Examples
Basic Usage
from crewai_tools import FirecrawlScrapeWebsiteTool
tool = FirecrawlScrapeWebsiteTool(api_key="your-firecrawl-key")
result = tool.run(url="https://example.com/article")
Custom Configuration
from crewai_tools import FirecrawlScrapeWebsiteTool
tool = FirecrawlScrapeWebsiteTool(
api_key="your-firecrawl-key",
config={
"formats": ["markdown", "html"],
"only_main_content": True,
"mobile": True,
"block_ads": True,
},
)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.