Workflow:Mistralai Client python OCR Document Processing
| Knowledge Sources | |
|---|---|
| Domains | LLMs, OCR, Document_Processing, Python_SDK |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
End-to-end process for extracting text, tables, and images from documents using the Mistral AI OCR API.
Description
This workflow covers how to use the Mistral AI OCR API to process documents (PDFs and images) and extract structured content including text, tables, and embedded images. The OCR model (mistral-ocr-latest) accepts documents via URL or file upload and returns page-by-page results with extracted text in markdown format, detected tables, images with optional base64 encoding, and page dimension metadata. The Azure SDK wrapper also supports OCR processing.
Usage
Execute this workflow when you need to digitize documents, extract text from scanned PDFs, process invoices, parse academic papers, or convert document images into machine-readable text. This is appropriate for document understanding pipelines, data extraction from forms, and any scenario requiring conversion of visual document content to structured text.
Execution Steps
Step 1: Install SDK and Configure Authentication
Install the mistralai Python package and configure the MISTRAL_API_KEY environment variable. The OCR API uses the same authentication as other Mistral endpoints.
Key considerations:
- Same installation and authentication as other Mistral API endpoints
- OCR is also available through the Azure SDK (mistralai_azure)
Step 2: Initialize the Mistral Client
Create a Mistral client instance. The OCR resource is available as client.ocr.
Key considerations:
- Use context manager pattern for resource management
- The same client instance can be used for chat, embeddings, and OCR
Step 3: Prepare Document Input
Specify the document to process either as a URL (for remote documents) or as an uploaded file. The document specification includes the source type, URL or file content, and an optional document name.
Key considerations:
- Documents can be specified by URL (type: document_url) or by file upload
- Supported formats include PDF and images
- The document_name parameter helps identify the document in results
- For file uploads, use the Files API to upload first, then reference the file ID
Step 4: Call the OCR API
Invoke the ocr.process() method with the document specification, OCR model identifier, and optional parameters such as include_image_base64 to request base64-encoded images in the response.
Key considerations:
- Use model="mistral-ocr-latest" for the OCR model
- Set include_image_base64=True to receive embedded images as base64 strings
- The document parameter is a dictionary with type, document_url (or file reference), and document_name
- Processing is synchronous; the response is returned when OCR is complete
Step 5: Process OCR Results
Parse the OCR response which contains page-by-page results. Each page includes extracted markdown text, detected tables (with optional format specification), extracted images with bounding boxes, and page dimension metadata.
Key considerations:
- Response contains pages as a list of OCRPageObject entries
- Each page has: markdown text, images, tables, and dimensions
- Tables include bounding boxes and can be in markdown format
- Images include bounding box coordinates and optional base64 data
- Usage information includes document pages and tokens processed
- Use model_dump_json() to serialize the full response as JSON