Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI OCR Tool

From Leeroopedia
Knowledge Sources
Domains Tools, OCR, Vision
Last Updated 2026-02-11 00:00 GMT

Overview

OCRTool extracts text from images by leveraging LLM vision capabilities rather than traditional OCR libraries.

Description

The OCRTool extends BaseTool and uses a Pydantic OCRToolSchema for input validation, accepting an image_path_url string that can be either a local file path or a remote URL. In its _run method, the tool determines the input type: for URLs (starting with "http"), the URL is passed directly; for local files, the image is read in binary mode and encoded to base64 via the static _encode_image method, then formatted as a data:image/jpeg;base64,... data URI. The tool constructs a two-message conversation: a system message instructing the LLM to act as an OCR specialist, and a user message containing the image as an image_url content block. The configured LLM (defaulting to gpt-4o with temperature 0.7) processes the image and returns extracted text. The tool supports both local and remote images, making it versatile for various document processing workflows.

Usage

Use this tool when a CrewAI agent needs to extract text from images, including scanned documents, screenshots, photographs of text, or any image containing readable content. The LLM used must support the vision feature.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/tools/ocr_tool/ocr_tool.py
  • Lines: 1-101

Signature

class OCRToolSchema(BaseModel):
    image_path_url: str = Field(description="The image path or URL.")

class OCRTool(BaseTool):
    name: str = "Optical Character Recognition Tool"
    description: str = "This tool uses an LLM's API to extract text from an image file."
    llm: LLM = Field(default_factory=lambda: LLM(model="gpt-4o", temperature=0.7))
    args_schema: type[BaseModel] = OCRToolSchema

    def _run(self, **kwargs) -> str: ...

    @staticmethod
    def _encode_image(image_path: str) -> str: ...

Import

from crewai_tools import OCRTool

I/O Contract

Inputs

Name Type Required Description
image_path_url str Yes Path to a local image file or URL of a remote image
llm LLM No Language model instance for vision API calls; defaults to gpt-4o with temperature 0.7

Outputs

Name Type Description
_run() returns str Extracted text from the image, or error message if no image path/URL provided

Usage Examples

Local Image File

from crewai_tools import OCRTool

tool = OCRTool()
text = tool._run(image_path_url="/path/to/document.jpg")

Remote Image URL

from crewai_tools import OCRTool

tool = OCRTool()
text = tool._run(image_path_url="https://example.com/image.png")

Custom LLM

from crewai.llm import LLM
from crewai_tools import OCRTool

tool = OCRTool(llm=LLM(model="gpt-4o-mini", temperature=0.5))
text = tool._run(image_path_url="/path/to/receipt.jpg")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment