Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI Firecrawl Crawl Tool

From Leeroopedia
Knowledge Sources
Domains Tools, Web_Scraping, Firecrawl
Last Updated 2026-02-11 00:00 GMT

Overview

Crawls websites using the Firecrawl v2 API to discover and extract content from multiple pages.

Description

FirecrawlCrawlWebsiteTool extends BaseTool and integrates with FirecrawlApp from the firecrawl-py package. It accepts a URL and crawls the site with configurable parameters including max_discovery_depth (default: 2), limit (default: 10 pages), ignore_sitemap, allow_external_links, allow_subdomains, and scrape_options (formats, main content filtering, timeout). The tool initializes FirecrawlApp with an API key from the constructor or the FIRECRAWL_API_KEY environment variable, and handles automatic package installation if firecrawl-py is missing. The _run() method calls crawl() with poll-based result retrieval.

Usage

Use this tool when a CrewAI agent needs to discover and extract content from multiple pages within a website, such as site-wide data extraction, content indexing, or documentation collection.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/firecrawl_crawl_website_tool/firecrawl_crawl_website_tool.py
  • Lines: 1-125

Signature

class FirecrawlCrawlWebsiteTool(BaseTool):
    name: str = "Firecrawl web crawl tool"
    description: str = "Crawl webpages using Firecrawl and return the contents"
    args_schema: type[BaseModel] = FirecrawlCrawlWebsiteToolSchema
    api_key: str | None = None
    config: dict[str, Any] | None = Field(default_factory=lambda: {...})

    def __init__(self, api_key: str | None = None, **kwargs): ...
    def _run(self, url: str): ...

Import

from crewai_tools import FirecrawlCrawlWebsiteTool

I/O Contract

Inputs

Name Type Required Description
url str Yes Website URL to crawl
api_key str No Firecrawl API key (constructor param; falls back to FIRECRAWL_API_KEY env var)
config dict No Crawl configuration overriding defaults (constructor param)

Default Configuration

Parameter Default Description
max_discovery_depth 2 Maximum depth for page discovery
ignore_sitemap True Whether to ignore the sitemap
limit 10 Maximum number of pages to crawl
allow_external_links False Allow crawling external links
allow_subdomains False Allow crawling subdomains
scrape_options.formats ["markdown"] Content formats to return
scrape_options.only_main_content True Only return main content
scrape_options.timeout 10000 Timeout in milliseconds

Outputs

Name Type Description
_run() returns Any Firecrawl crawl results containing extracted content from discovered pages

Usage Examples

Basic Usage

from crewai_tools import FirecrawlCrawlWebsiteTool

tool = FirecrawlCrawlWebsiteTool(api_key="your-firecrawl-key")
result = tool.run(url="https://docs.example.com")

Custom Configuration

from crewai_tools import FirecrawlCrawlWebsiteTool

tool = FirecrawlCrawlWebsiteTool(
    api_key="your-firecrawl-key",
    config={
        "max_discovery_depth": 3,
        "limit": 50,
        "allow_subdomains": True,
        "scrape_options": {"formats": ["markdown"], "only_main_content": True},
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment