Implementation:CrewAIInc CrewAI Firecrawl Scrape Tool

Knowledge Sources	CrewAI
Domains	Tools, Web_Scraping, Firecrawl
Last Updated	2026-02-11 00:00 GMT

Overview

Scrapes individual web pages using the Firecrawl v2 API to extract clean, structured content.

Description

FirecrawlScrapeWebsiteTool extends BaseTool and integrates with FirecrawlApp from the firecrawl-py package. It accepts a single URL and scrapes it with extensive configurable options including content formats (default: markdown), only_main_content filtering, include_tags / exclude_tags, caching via max_age, mobile emulation, skip_tls_verification, block_ads, proxy configuration, and store_in_cache. The tool initializes FirecrawlApp with an API key and handles automatic package installation if firecrawl-py is missing. The _run() method calls scrape() on the FirecrawlApp instance with the configured options.

Usage

Use this tool when a CrewAI agent needs to extract content from a single web page with fine-grained control over scraping behavior, including anti-bot bypass features and content filtering.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/tools/firecrawl_scrape_website_tool/firecrawl_scrape_website_tool.py
Lines: 1-125

Signature

class FirecrawlScrapeWebsiteTool(BaseTool):
    name: str = "Firecrawl web scrape tool"
    description: str = "Scrape webpages using Firecrawl and return the contents"
    args_schema: type[BaseModel] = FirecrawlScrapeWebsiteToolSchema
    api_key: str | None = None
    config: dict[str, Any] = Field(default_factory=lambda: {...})

    def __init__(self, api_key: str | None = None, **kwargs): ...
    def _run(self, url: str): ...

Import

from crewai_tools import FirecrawlScrapeWebsiteTool

I/O Contract

Inputs

Name	Type	Required	Description
url	str	Yes	Website URL to scrape
api_key	str	No	Firecrawl API key (constructor param; falls back to FIRECRAWL_API_KEY env var)
config	dict	No	Scrape configuration overriding defaults (constructor param)

Default Configuration

Parameter	Default	Description
formats	["markdown"]	Content formats to return
only_main_content	True	Exclude headers, navs, footers
max_age	172800000 (2 days)	Cache age limit in milliseconds
mobile	False	Emulate mobile device
skip_tls_verification	True	Skip TLS certificate verification
block_ads	True	Enable ad and cookie popup blocking
proxy	"auto"	Proxy type (`basic`, `stealth`, `auto`)
store_in_cache	True	Store page in Firecrawl cache

Outputs

Name	Type	Description
_run() returns	Any	Firecrawl scrape result containing extracted page content

Usage Examples

Basic Usage

from crewai_tools import FirecrawlScrapeWebsiteTool

tool = FirecrawlScrapeWebsiteTool(api_key="your-firecrawl-key")
result = tool.run(url="https://example.com/article")

Custom Configuration

from crewai_tools import FirecrawlScrapeWebsiteTool

tool = FirecrawlScrapeWebsiteTool(
    api_key="your-firecrawl-key",
    config={
        "formats": ["markdown", "html"],
        "only_main_content": True,
        "mobile": True,
        "block_ads": True,
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment