Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI Firecrawl Scrape Tool

From Leeroopedia
Knowledge Sources
Domains Tools, Web_Scraping, Firecrawl
Last Updated 2026-02-11 00:00 GMT

Overview

Scrapes individual web pages using the Firecrawl v2 API to extract clean, structured content.

Description

FirecrawlScrapeWebsiteTool extends BaseTool and integrates with FirecrawlApp from the firecrawl-py package. It accepts a single URL and scrapes it with extensive configurable options including content formats (default: markdown), only_main_content filtering, include_tags / exclude_tags, caching via max_age, mobile emulation, skip_tls_verification, block_ads, proxy configuration, and store_in_cache. The tool initializes FirecrawlApp with an API key and handles automatic package installation if firecrawl-py is missing. The _run() method calls scrape() on the FirecrawlApp instance with the configured options.

Usage

Use this tool when a CrewAI agent needs to extract content from a single web page with fine-grained control over scraping behavior, including anti-bot bypass features and content filtering.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/firecrawl_scrape_website_tool/firecrawl_scrape_website_tool.py
  • Lines: 1-125

Signature

class FirecrawlScrapeWebsiteTool(BaseTool):
    name: str = "Firecrawl web scrape tool"
    description: str = "Scrape webpages using Firecrawl and return the contents"
    args_schema: type[BaseModel] = FirecrawlScrapeWebsiteToolSchema
    api_key: str | None = None
    config: dict[str, Any] = Field(default_factory=lambda: {...})

    def __init__(self, api_key: str | None = None, **kwargs): ...
    def _run(self, url: str): ...

Import

from crewai_tools import FirecrawlScrapeWebsiteTool

I/O Contract

Inputs

Name Type Required Description
url str Yes Website URL to scrape
api_key str No Firecrawl API key (constructor param; falls back to FIRECRAWL_API_KEY env var)
config dict No Scrape configuration overriding defaults (constructor param)

Default Configuration

Parameter Default Description
formats ["markdown"] Content formats to return
only_main_content True Exclude headers, navs, footers
max_age 172800000 (2 days) Cache age limit in milliseconds
mobile False Emulate mobile device
skip_tls_verification True Skip TLS certificate verification
block_ads True Enable ad and cookie popup blocking
proxy "auto" Proxy type (basic, stealth, auto)
store_in_cache True Store page in Firecrawl cache

Outputs

Name Type Description
_run() returns Any Firecrawl scrape result containing extracted page content

Usage Examples

Basic Usage

from crewai_tools import FirecrawlScrapeWebsiteTool

tool = FirecrawlScrapeWebsiteTool(api_key="your-firecrawl-key")
result = tool.run(url="https://example.com/article")

Custom Configuration

from crewai_tools import FirecrawlScrapeWebsiteTool

tool = FirecrawlScrapeWebsiteTool(
    api_key="your-firecrawl-key",
    config={
        "formats": ["markdown", "html"],
        "only_main_content": True,
        "mobile": True,
        "block_ads": True,
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment