Overview
Crawls websites using the Firecrawl v2 API to discover and extract content from multiple pages.
Description
FirecrawlCrawlWebsiteTool extends BaseTool and integrates with FirecrawlApp from the firecrawl-py package. It accepts a URL and crawls the site with configurable parameters including max_discovery_depth (default: 2), limit (default: 10 pages), ignore_sitemap, allow_external_links, allow_subdomains, and scrape_options (formats, main content filtering, timeout). The tool initializes FirecrawlApp with an API key from the constructor or the FIRECRAWL_API_KEY environment variable, and handles automatic package installation if firecrawl-py is missing. The _run() method calls crawl() with poll-based result retrieval.
Usage
Use this tool when a CrewAI agent needs to discover and extract content from multiple pages within a website, such as site-wide data extraction, content indexing, or documentation collection.
Code Reference
Source Location
- Repository: CrewAI
- File:
lib/crewai-tools/src/crewai_tools/tools/firecrawl_crawl_website_tool/firecrawl_crawl_website_tool.py
- Lines: 1-125
Signature
class FirecrawlCrawlWebsiteTool(BaseTool):
name: str = "Firecrawl web crawl tool"
description: str = "Crawl webpages using Firecrawl and return the contents"
args_schema: type[BaseModel] = FirecrawlCrawlWebsiteToolSchema
api_key: str | None = None
config: dict[str, Any] | None = Field(default_factory=lambda: {...})
def __init__(self, api_key: str | None = None, **kwargs): ...
def _run(self, url: str): ...
Import
from crewai_tools import FirecrawlCrawlWebsiteTool
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| url |
str |
Yes |
Website URL to crawl
|
| api_key |
str |
No |
Firecrawl API key (constructor param; falls back to FIRECRAWL_API_KEY env var)
|
| config |
dict |
No |
Crawl configuration overriding defaults (constructor param)
|
Default Configuration
| Parameter |
Default |
Description
|
| max_discovery_depth |
2 |
Maximum depth for page discovery
|
| ignore_sitemap |
True |
Whether to ignore the sitemap
|
| limit |
10 |
Maximum number of pages to crawl
|
| allow_external_links |
False |
Allow crawling external links
|
| allow_subdomains |
False |
Allow crawling subdomains
|
| scrape_options.formats |
["markdown"] |
Content formats to return
|
| scrape_options.only_main_content |
True |
Only return main content
|
| scrape_options.timeout |
10000 |
Timeout in milliseconds
|
Outputs
| Name |
Type |
Description
|
| _run() returns |
Any |
Firecrawl crawl results containing extracted content from discovered pages
|
Usage Examples
Basic Usage
from crewai_tools import FirecrawlCrawlWebsiteTool
tool = FirecrawlCrawlWebsiteTool(api_key="your-firecrawl-key")
result = tool.run(url="https://docs.example.com")
Custom Configuration
from crewai_tools import FirecrawlCrawlWebsiteTool
tool = FirecrawlCrawlWebsiteTool(
api_key="your-firecrawl-key",
config={
"max_discovery_depth": 3,
"limit": 50,
"allow_subdomains": True,
"scrape_options": {"formats": ["markdown"], "only_main_content": True},
},
)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.