Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI Scrape Element Tool

From Leeroopedia
Revision as of 11:09, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/CrewAIInc_CrewAI_Scrape_Element_Tool.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Tools, Web_Scraping
Last Updated 2026-02-11 00:00 GMT

Overview

ScrapeElementFromWebsiteTool extracts specific HTML elements from web pages using CSS selectors via BeautifulSoup.

Description

ScrapeElementFromWebsiteTool extends BaseTool and uses a dual-schema pattern: ScrapeElementFromWebsiteToolSchema requires both website_url and css_element as inputs, while FixedScrapeElementFromWebsiteToolSchema is an empty schema used when the URL and CSS element are pre-configured at initialization. The tool includes default browser-like HTTP headers (User-Agent, Accept, etc.) to avoid request blocking. Cookie support reads values from environment variables. The _run() method performs an HTTP GET request using the requests library, parses the HTML with BeautifulSoup, selects elements matching the CSS selector via parsed.select(), and returns the concatenated text content of all matched elements. BeautifulSoup availability is checked at runtime with a helpful error message if missing.

Usage

Use this tool for targeted element extraction from web pages when you know the CSS selector of the content you need. It complements the full-page ScrapeWebsiteTool by allowing precise selection of specific page elements, which is valuable for structured data extraction from known page layouts.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/scrape_element_from_website/scrape_element_from_website.py
  • Lines: 1-92

Signature

class FixedScrapeElementFromWebsiteToolSchema(BaseModel):
    pass

class ScrapeElementFromWebsiteToolSchema(FixedScrapeElementFromWebsiteToolSchema):
    website_url: str = Field(..., description="Mandatory website url to read the file")
    css_element: str = Field(..., description="Mandatory css reference for element to scrape from the website")

class ScrapeElementFromWebsiteTool(BaseTool):
    name: str = "Read a website content"
    description: str = "A tool that can be used to read a website content."
    args_schema: type[BaseModel] = ScrapeElementFromWebsiteToolSchema
    website_url: str | None = None
    cookies: dict | None = None
    css_element: str | None = None
    headers: dict | None  # default browser-like headers

    def __init__(self, website_url=None, cookies=None, css_element=None, **kwargs)
    def _run(self, **kwargs) -> Any

Import

from crewai_tools import ScrapeElementFromWebsiteTool

I/O Contract

Inputs

Name Type Required Description
website_url str Yes URL of the website to scrape (optional if set at init)
css_element str Yes CSS selector for the element(s) to extract (optional if set at init)

Outputs

Name Type Description
_run() returns str Concatenated text content of all HTML elements matching the CSS selector

Usage Examples

Basic Usage

from crewai_tools import ScrapeElementFromWebsiteTool

# Dynamic URL and selector
tool = ScrapeElementFromWebsiteTool()
result = tool._run(website_url="https://example.com", css_element="h1.title")

# Pre-configured URL and selector
tool = ScrapeElementFromWebsiteTool(
    website_url="https://example.com",
    css_element="div.article-content"
)
result = tool._run()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment