Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:BerriAI Litellm OCR API

From Leeroopedia
Property Value
sources litellm/ocr/main.py
domains OCR, Document Processing, Image Processing
last_updated 2026-02-15 16:00 GMT

Overview

The OCR API module provides a unified interface for optical character recognition, allowing extraction of text and structured content from documents and images through provider-specific OCR models.

Description

This module implements OCR functionality through a ocr/aocr function pair decorated with @client. It accepts documents in Mistral OCR format, supporting two document types: document_url for PDFs and documents, and image_url for images. The module validates the input document format, resolves the provider and model via litellm.get_llm_provider(), loads a BaseOCRConfig for the provider, extracts and maps OCR-specific parameters (such as include_image_base64, pages, image_limit) through the provider config, and delegates the HTTP call to BaseLLMHTTPHandler.ocr(). Base64-encoded document content is also supported for inline document processing.

Usage

Import this module when you need to extract text content from PDFs, documents, or images through an OCR model. It follows the Mistral OCR API format and supports both sync and async operations.

Code Reference

Source Location

Property Value
Repository github.com/BerriAI/litellm
File litellm/ocr/main.py
Lines 302
Module litellm.ocr.main

Signature

@client
def ocr(
    model: str,
    document: Dict[str, str],
    api_key: Optional[str] = None,
    api_base: Optional[str] = None,
    timeout: Optional[Union[float, httpx.Timeout]] = None,
    custom_llm_provider: Optional[str] = None,
    extra_headers: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> Union[OCRResponse, Coroutine[Any, Any, OCRResponse]]

@client
async def aocr(
    model: str,
    document: Dict[str, str],
    ...
) -> OCRResponse

Import

from litellm.ocr.main import ocr, aocr

I/O Contract

Inputs

Parameter Type Required Description
model str Yes The OCR model identifier (e.g., "mistral/mistral-ocr-latest")
document Dict[str, str] Yes Document specification with type and URL field
api_key Optional[str] No API key for the OCR provider
api_base Optional[str] No API base URL override
timeout Optional[Union[float, httpx.Timeout]] No Request timeout
custom_llm_provider Optional[str] No Provider override; auto-detected from model
extra_headers Optional[Dict[str, Any]] No Additional HTTP headers
include_image_base64 via kwargs No Whether to include base64 image data in response
pages via kwargs No Specific pages to process

Outputs

Output Type Description
Response OCRResponse Contains extracted pages with markdown content, model info, and usage data

Usage Examples

import litellm

# OCR with a PDF document URL
response = litellm.ocr(
    model="mistral/mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": "https://arxiv.org/pdf/2201.04234"
    },
    include_image_base64=True,
)

for page in response.pages:
    print(f"Page {page.index}: {page.markdown[:100]}...")
import asyncio
import litellm

async def main():
    response = await litellm.aocr(
        model="mistral/mistral-ocr-latest",
        document={
            "type": "image_url",
            "image_url": "https://example.com/receipt.png"
        },
    )
    print(response)

asyncio.run(main())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment