Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI PDF Search Tool

From Leeroopedia
Knowledge Sources
Domains Tools, RAG, Search
Last Updated 2026-02-11 00:00 GMT

Overview

PDFSearchTool provides semantic search capabilities for querying content within PDF files using RAG-based retrieval.

Description

PDFSearchTool extends RagTool and specializes it for PDF content. It defines two Pydantic schemas: PDFSearchToolSchema (requiring both a query and a PDF path/URL) and FixedPDFSearchToolSchema (requiring only a query, used when a PDF is pre-configured). On initialization via a model_validator, if a pdf path is provided, it calls self.add() to ingest the PDF into the knowledge base, updates the description to reference that specific PDF, and switches to the fixed schema. The add() method delegates to RagTool.add() with DataType.PDF_FILE. The _run() method optionally adds a new PDF at runtime and then delegates to RagTool._run() for similarity-based retrieval.

Usage

Use this tool when agents need to perform semantic search over PDF documents, such as extracting information from reports, contracts, or research papers. A specific PDF can be locked at initialization time, or the agent can specify different PDFs at runtime.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/pdf_search_tool/pdf_search_tool.py
  • Lines: 1-54

Signature

class FixedPDFSearchToolSchema(BaseModel):
    query: str = Field(..., description="Mandatory query you want to use to search the PDF's content")

class PDFSearchToolSchema(FixedPDFSearchToolSchema):
    pdf: str = Field(..., description="File path or URL of a PDF file to be searched")

class PDFSearchTool(RagTool):
    name: str = "Search a PDF's content"
    description: str = "A tool that can be used to semantic search a query from a PDF's content."
    args_schema: type[BaseModel] = PDFSearchToolSchema
    pdf: str | None = None

    def add(self, pdf: str) -> None
    def _run(self, query: str, pdf: str | None = None,
             similarity_threshold: float | None = None, limit: int | None = None) -> str

Import

from crewai_tools import PDFSearchTool

I/O Contract

Inputs

Name Type Required Description
query str Yes Search query to use against the PDF's content
pdf str or None No File path or URL of a PDF file to be searched (optional if set at init)
similarity_threshold float or None No Minimum similarity score for results
limit int or None No Maximum number of results to return

Outputs

Name Type Description
_run() returns str Relevant content retrieved from the PDF via similarity search

Usage Examples

Basic Usage

from crewai_tools import PDFSearchTool

# Pre-configured PDF
tool = PDFSearchTool(pdf="path/to/document.pdf")
result = tool._run(query="What are the key findings?")

# Dynamic PDF at runtime
tool = PDFSearchTool()
result = tool._run(query="What is the revenue?", pdf="path/to/report.pdf")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment