Implementation:CrewAIInc CrewAI Scrape Element Tool
| Knowledge Sources | |
|---|---|
| Domains | Tools, Web_Scraping |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
ScrapeElementFromWebsiteTool extracts specific HTML elements from web pages using CSS selectors via BeautifulSoup.
Description
ScrapeElementFromWebsiteTool extends BaseTool and uses a dual-schema pattern: ScrapeElementFromWebsiteToolSchema requires both website_url and css_element as inputs, while FixedScrapeElementFromWebsiteToolSchema is an empty schema used when the URL and CSS element are pre-configured at initialization. The tool includes default browser-like HTTP headers (User-Agent, Accept, etc.) to avoid request blocking. Cookie support reads values from environment variables. The _run() method performs an HTTP GET request using the requests library, parses the HTML with BeautifulSoup, selects elements matching the CSS selector via parsed.select(), and returns the concatenated text content of all matched elements. BeautifulSoup availability is checked at runtime with a helpful error message if missing.
Usage
Use this tool for targeted element extraction from web pages when you know the CSS selector of the content you need. It complements the full-page ScrapeWebsiteTool by allowing precise selection of specific page elements, which is valuable for structured data extraction from known page layouts.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/tools/scrape_element_from_website/scrape_element_from_website.py
- Lines: 1-92
Signature
class FixedScrapeElementFromWebsiteToolSchema(BaseModel):
pass
class ScrapeElementFromWebsiteToolSchema(FixedScrapeElementFromWebsiteToolSchema):
website_url: str = Field(..., description="Mandatory website url to read the file")
css_element: str = Field(..., description="Mandatory css reference for element to scrape from the website")
class ScrapeElementFromWebsiteTool(BaseTool):
name: str = "Read a website content"
description: str = "A tool that can be used to read a website content."
args_schema: type[BaseModel] = ScrapeElementFromWebsiteToolSchema
website_url: str | None = None
cookies: dict | None = None
css_element: str | None = None
headers: dict | None # default browser-like headers
def __init__(self, website_url=None, cookies=None, css_element=None, **kwargs)
def _run(self, **kwargs) -> Any
Import
from crewai_tools import ScrapeElementFromWebsiteTool
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| website_url | str | Yes | URL of the website to scrape (optional if set at init) |
| css_element | str | Yes | CSS selector for the element(s) to extract (optional if set at init) |
Outputs
| Name | Type | Description |
|---|---|---|
| _run() returns | str | Concatenated text content of all HTML elements matching the CSS selector |
Usage Examples
Basic Usage
from crewai_tools import ScrapeElementFromWebsiteTool
# Dynamic URL and selector
tool = ScrapeElementFromWebsiteTool()
result = tool._run(website_url="https://example.com", css_element="h1.title")
# Pre-configured URL and selector
tool = ScrapeElementFromWebsiteTool(
website_url="https://example.com",
css_element="div.article-content"
)
result = tool._run()