Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI Tavily Extractor Tool

From Leeroopedia
Knowledge Sources
Domains Tools, Web_Extraction
Last Updated 2026-02-11 00:00 GMT

Overview

TavilyExtractorTool extracts structured content from one or more web pages using the Tavily API.

Description

The TavilyExtractorTool extends BaseTool and wraps both synchronous (TavilyClient) and asynchronous (AsyncTavilyClient) Tavily clients. On initialization, it creates the clients using the provided API key or the TAVILY_API_KEY environment variable, with optional proxy configuration. If the tavily-python package is missing, it interactively prompts the user for installation via click.confirm. The _run and _arun methods accept a URL or list of URLs, call the Tavily extract API with configurable extract_depth ("basic" or "advanced"), include_images flag, and timeout, and return the results as a formatted JSON string.

Usage

Use this tool when a CrewAI agent needs to extract and structure content from specific web pages, supporting both single URL and batch extraction for web content analysis workflows.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/tavily_extractor_tool/tavily_extractor_tool.py
  • Lines: 1-176

Signature

class TavilyExtractorToolSchema(BaseModel):
    urls: list[str] | str = Field(..., description="The URL(s) to extract data from. ...")

class TavilyExtractorTool(BaseTool):
    name: str = "TavilyExtractorTool"
    description: str = "Extracts content from one or more web pages using the Tavily API. ..."
    args_schema: type[BaseModel] = TavilyExtractorToolSchema
    api_key: str | None = Field(default_factory=lambda: os.getenv("TAVILY_API_KEY"))
    proxies: dict[str, str] | None = None
    include_images: bool = False
    extract_depth: Literal["basic", "advanced"] = "basic"
    timeout: int = 60
    env_vars: list[EnvVar]  # TAVILY_API_KEY

    def _run(self, urls: list[str] | str) -> str:
        ...
    async def _arun(self, urls: list[str] | str) -> str:
        ...

Import

from crewai_tools import TavilyExtractorTool

I/O Contract

Inputs

Name Type Required Description
urls list[str] or str Yes The URL or list of URLs to extract data from

Outputs

Name Type Description
_run() returns str JSON string containing the extracted structured data from the provided URLs

Usage Examples

Basic Usage

from crewai_tools import TavilyExtractorTool

tool = TavilyExtractorTool(extract_depth="basic")
result = tool._run(urls="https://example.com")

Multiple URLs

from crewai_tools import TavilyExtractorTool

tool = TavilyExtractorTool(extract_depth="advanced", include_images=True)
result = tool._run(urls=["https://example.com/page1", "https://example.com/page2"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment