| Property |
Value
|
| sources |
litellm/ocr/main.py
|
| domains |
OCR, Document Processing, Image Processing
|
| last_updated |
2026-02-15 16:00 GMT
|
Overview
The OCR API module provides a unified interface for optical character recognition, allowing extraction of text and structured content from documents and images through provider-specific OCR models.
Description
This module implements OCR functionality through a ocr/aocr function pair decorated with @client. It accepts documents in Mistral OCR format, supporting two document types: document_url for PDFs and documents, and image_url for images. The module validates the input document format, resolves the provider and model via litellm.get_llm_provider(), loads a BaseOCRConfig for the provider, extracts and maps OCR-specific parameters (such as include_image_base64, pages, image_limit) through the provider config, and delegates the HTTP call to BaseLLMHTTPHandler.ocr(). Base64-encoded document content is also supported for inline document processing.
Usage
Import this module when you need to extract text content from PDFs, documents, or images through an OCR model. It follows the Mistral OCR API format and supports both sync and async operations.
Code Reference
Source Location
Signature
@client
def ocr(
model: str,
document: Dict[str, str],
api_key: Optional[str] = None,
api_base: Optional[str] = None,
timeout: Optional[Union[float, httpx.Timeout]] = None,
custom_llm_provider: Optional[str] = None,
extra_headers: Optional[Dict[str, Any]] = None,
**kwargs,
) -> Union[OCRResponse, Coroutine[Any, Any, OCRResponse]]
@client
async def aocr(
model: str,
document: Dict[str, str],
...
) -> OCRResponse
Import
from litellm.ocr.main import ocr, aocr
I/O Contract
Inputs
| Parameter |
Type |
Required |
Description
|
model |
str |
Yes |
The OCR model identifier (e.g., "mistral/mistral-ocr-latest")
|
document |
Dict[str, str] |
Yes |
Document specification with type and URL field
|
api_key |
Optional[str] |
No |
API key for the OCR provider
|
api_base |
Optional[str] |
No |
API base URL override
|
timeout |
Optional[Union[float, httpx.Timeout]] |
No |
Request timeout
|
custom_llm_provider |
Optional[str] |
No |
Provider override; auto-detected from model
|
extra_headers |
Optional[Dict[str, Any]] |
No |
Additional HTTP headers
|
include_image_base64 |
via kwargs |
No |
Whether to include base64 image data in response
|
pages |
via kwargs |
No |
Specific pages to process
|
Outputs
| Output |
Type |
Description
|
| Response |
OCRResponse |
Contains extracted pages with markdown content, model info, and usage data
|
Usage Examples
import litellm
# OCR with a PDF document URL
response = litellm.ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
},
include_image_base64=True,
)
for page in response.pages:
print(f"Page {page.index}: {page.markdown[:100]}...")
import asyncio
import litellm
async def main():
response = await litellm.aocr(
model="mistral/mistral-ocr-latest",
document={
"type": "image_url",
"image_url": "https://example.com/receipt.png"
},
)
print(response)
asyncio.run(main())
Related Pages