Implementation:CrewAIInc CrewAI PDF Search Tool
| Knowledge Sources | |
|---|---|
| Domains | Tools, RAG, Search |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
PDFSearchTool provides semantic search capabilities for querying content within PDF files using RAG-based retrieval.
Description
PDFSearchTool extends RagTool and specializes it for PDF content. It defines two Pydantic schemas: PDFSearchToolSchema (requiring both a query and a PDF path/URL) and FixedPDFSearchToolSchema (requiring only a query, used when a PDF is pre-configured). On initialization via a model_validator, if a pdf path is provided, it calls self.add() to ingest the PDF into the knowledge base, updates the description to reference that specific PDF, and switches to the fixed schema. The add() method delegates to RagTool.add() with DataType.PDF_FILE. The _run() method optionally adds a new PDF at runtime and then delegates to RagTool._run() for similarity-based retrieval.
Usage
Use this tool when agents need to perform semantic search over PDF documents, such as extracting information from reports, contracts, or research papers. A specific PDF can be locked at initialization time, or the agent can specify different PDFs at runtime.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/tools/pdf_search_tool/pdf_search_tool.py
- Lines: 1-54
Signature
class FixedPDFSearchToolSchema(BaseModel):
query: str = Field(..., description="Mandatory query you want to use to search the PDF's content")
class PDFSearchToolSchema(FixedPDFSearchToolSchema):
pdf: str = Field(..., description="File path or URL of a PDF file to be searched")
class PDFSearchTool(RagTool):
name: str = "Search a PDF's content"
description: str = "A tool that can be used to semantic search a query from a PDF's content."
args_schema: type[BaseModel] = PDFSearchToolSchema
pdf: str | None = None
def add(self, pdf: str) -> None
def _run(self, query: str, pdf: str | None = None,
similarity_threshold: float | None = None, limit: int | None = None) -> str
Import
from crewai_tools import PDFSearchTool
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | str | Yes | Search query to use against the PDF's content |
| str or None | No | File path or URL of a PDF file to be searched (optional if set at init) | |
| similarity_threshold | float or None | No | Minimum similarity score for results |
| limit | int or None | No | Maximum number of results to return |
Outputs
| Name | Type | Description |
|---|---|---|
| _run() returns | str | Relevant content retrieved from the PDF via similarity search |
Usage Examples
Basic Usage
from crewai_tools import PDFSearchTool
# Pre-configured PDF
tool = PDFSearchTool(pdf="path/to/document.pdf")
result = tool._run(query="What are the key findings?")
# Dynamic PDF at runtime
tool = PDFSearchTool()
result = tool._run(query="What is the revenue?", pdf="path/to/report.pdf")