Implementation:CrewAIInc CrewAI Scrape Website Tool

Knowledge Sources	CrewAI
Domains	Tools, Web_Scraping
Last Updated	2026-02-11 00:00 GMT

Overview

ScrapeWebsiteTool fetches and extracts clean text content from web pages using HTTP requests and BeautifulSoup.

Description

ScrapeWebsiteTool extends BaseTool and follows the dual-schema pattern: ScrapeWebsiteToolSchema requires a website_url argument, while FixedScrapeWebsiteToolSchema is empty (used when the URL is pre-configured). The tool sets browser-like default headers to avoid request blocking. On initialization, it immediately checks for BeautifulSoup availability (raising an ImportError with install instructions if missing). If a website_url is provided at init time, it locks the tool to that URL and switches to the fixed schema. Cookie support reads values from environment variables. The _run() method performs an HTTP GET request with a 15-second timeout, sets encoding from apparent_encoding for proper character detection, parses HTML with BeautifulSoup, extracts all text using get_text(" "), and applies two regex substitutions to clean up whitespace: collapsing multiple spaces/tabs and condensing blank-line-heavy sections.

Usage

Use this tool as the primary lightweight web scraping option when agents need to read website content without requiring a browser engine. It is the simplest scraping option compared to Selenium, Scrapfly, or Scrapegraph alternatives.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/tools/scrape_website_tool/scrape_website_tool.py
Lines: 1-89

Signature

class FixedScrapeWebsiteToolSchema(BaseModel):
    pass

class ScrapeWebsiteToolSchema(FixedScrapeWebsiteToolSchema):
    website_url: str = Field(..., description="Mandatory website url to read the file")

class ScrapeWebsiteTool(BaseTool):
    name: str = "Read website content"
    description: str = "A tool that can be used to read a website content."
    args_schema: type[BaseModel] = ScrapeWebsiteToolSchema
    website_url: str | None = None
    cookies: dict | None = None
    headers: dict | None  # default browser-like headers

    def __init__(self, website_url=None, cookies=None, **kwargs)
    def _run(self, **kwargs) -> Any

Import

from crewai_tools import ScrapeWebsiteTool

I/O Contract

Inputs

Name	Type	Required	Description
website_url	str	Yes	URL of the website to scrape (optional if set at init)

Outputs

Name	Type	Description
_run() returns	str	Cleaned text content of the website prefixed with "The following text is scraped website content:\n\n"

Usage Examples

Basic Usage

from crewai_tools import ScrapeWebsiteTool

# Dynamic URL
tool = ScrapeWebsiteTool()
result = tool._run(website_url="https://example.com")

# Pre-configured URL
tool = ScrapeWebsiteTool(website_url="https://example.com")
result = tool._run()

Related Pages

Principle:CrewAIInc_CrewAI_Built_In_Tool_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment