Implementation:CrewAIInc CrewAI OCR Tool
| Knowledge Sources | |
|---|---|
| Domains | Tools, OCR, Vision |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
OCRTool extracts text from images by leveraging LLM vision capabilities rather than traditional OCR libraries.
Description
The OCRTool extends BaseTool and uses a Pydantic OCRToolSchema for input validation, accepting an image_path_url string that can be either a local file path or a remote URL. In its _run method, the tool determines the input type: for URLs (starting with "http"), the URL is passed directly; for local files, the image is read in binary mode and encoded to base64 via the static _encode_image method, then formatted as a data:image/jpeg;base64,... data URI. The tool constructs a two-message conversation: a system message instructing the LLM to act as an OCR specialist, and a user message containing the image as an image_url content block. The configured LLM (defaulting to gpt-4o with temperature 0.7) processes the image and returns extracted text. The tool supports both local and remote images, making it versatile for various document processing workflows.
Usage
Use this tool when a CrewAI agent needs to extract text from images, including scanned documents, screenshots, photographs of text, or any image containing readable content. The LLM used must support the vision feature.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/tools/ocr_tool/ocr_tool.py
- Lines: 1-101
Signature
class OCRToolSchema(BaseModel):
image_path_url: str = Field(description="The image path or URL.")
class OCRTool(BaseTool):
name: str = "Optical Character Recognition Tool"
description: str = "This tool uses an LLM's API to extract text from an image file."
llm: LLM = Field(default_factory=lambda: LLM(model="gpt-4o", temperature=0.7))
args_schema: type[BaseModel] = OCRToolSchema
def _run(self, **kwargs) -> str: ...
@staticmethod
def _encode_image(image_path: str) -> str: ...
Import
from crewai_tools import OCRTool
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_path_url | str | Yes | Path to a local image file or URL of a remote image |
| llm | LLM | No | Language model instance for vision API calls; defaults to gpt-4o with temperature 0.7 |
Outputs
| Name | Type | Description |
|---|---|---|
| _run() returns | str | Extracted text from the image, or error message if no image path/URL provided |
Usage Examples
Local Image File
from crewai_tools import OCRTool
tool = OCRTool()
text = tool._run(image_path_url="/path/to/document.jpg")
Remote Image URL
from crewai_tools import OCRTool
tool = OCRTool()
text = tool._run(image_path_url="https://example.com/image.png")
Custom LLM
from crewai.llm import LLM
from crewai_tools import OCRTool
tool = OCRTool(llm=LLM(model="gpt-4o-mini", temperature=0.5))
text = tool._run(image_path_url="/path/to/receipt.jpg")