Implementation:CrewAIInc CrewAI OCR Tool

Knowledge Sources	CrewAI
Domains	Tools, OCR, Vision
Last Updated	2026-02-11 00:00 GMT

Overview

OCRTool extracts text from images by leveraging LLM vision capabilities rather than traditional OCR libraries.

Description

The OCRTool extends BaseTool and uses a Pydantic OCRToolSchema for input validation, accepting an image_path_url string that can be either a local file path or a remote URL. In its _run method, the tool determines the input type: for URLs (starting with "http"), the URL is passed directly; for local files, the image is read in binary mode and encoded to base64 via the static _encode_image method, then formatted as a data:image/jpeg;base64,... data URI. The tool constructs a two-message conversation: a system message instructing the LLM to act as an OCR specialist, and a user message containing the image as an image_url content block. The configured LLM (defaulting to gpt-4o with temperature 0.7) processes the image and returns extracted text. The tool supports both local and remote images, making it versatile for various document processing workflows.

Usage

Use this tool when a CrewAI agent needs to extract text from images, including scanned documents, screenshots, photographs of text, or any image containing readable content. The LLM used must support the vision feature.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/tools/ocr_tool/ocr_tool.py
Lines: 1-101

Signature

class OCRToolSchema(BaseModel):
    image_path_url: str = Field(description="The image path or URL.")

class OCRTool(BaseTool):
    name: str = "Optical Character Recognition Tool"
    description: str = "This tool uses an LLM's API to extract text from an image file."
    llm: LLM = Field(default_factory=lambda: LLM(model="gpt-4o", temperature=0.7))
    args_schema: type[BaseModel] = OCRToolSchema

    def _run(self, **kwargs) -> str: ...

    @staticmethod
    def _encode_image(image_path: str) -> str: ...

Import

from crewai_tools import OCRTool

I/O Contract

Inputs

Name	Type	Required	Description
image_path_url	str	Yes	Path to a local image file or URL of a remote image
llm	LLM	No	Language model instance for vision API calls; defaults to gpt-4o with temperature 0.7

Outputs

Name	Type	Description
_run() returns	str	Extracted text from the image, or error message if no image path/URL provided

Usage Examples

Local Image File

from crewai_tools import OCRTool

tool = OCRTool()
text = tool._run(image_path_url="/path/to/document.jpg")

Remote Image URL

from crewai_tools import OCRTool

tool = OCRTool()
text = tool._run(image_path_url="https://example.com/image.png")

Custom LLM

from crewai.llm import LLM
from crewai_tools import OCRTool

tool = OCRTool(llm=LLM(model="gpt-4o-mini", temperature=0.5))
text = tool._run(image_path_url="/path/to/receipt.jpg")

Related Pages

Principle:CrewAIInc_CrewAI_Built_In_Tool_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment