Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Mistralai Client python OCR Document Processing

From Leeroopedia
Knowledge Sources
Domains LLMs, OCR, Document_Processing, Python_SDK
Last Updated 2026-02-15 14:00 GMT

Overview

End-to-end process for extracting text, tables, and images from documents using the Mistral AI OCR API.

Description

This workflow covers how to use the Mistral AI OCR API to process documents (PDFs and images) and extract structured content including text, tables, and embedded images. The OCR model (mistral-ocr-latest) accepts documents via URL or file upload and returns page-by-page results with extracted text in markdown format, detected tables, images with optional base64 encoding, and page dimension metadata. The Azure SDK wrapper also supports OCR processing.

Usage

Execute this workflow when you need to digitize documents, extract text from scanned PDFs, process invoices, parse academic papers, or convert document images into machine-readable text. This is appropriate for document understanding pipelines, data extraction from forms, and any scenario requiring conversion of visual document content to structured text.

Execution Steps

Step 1: Install SDK and Configure Authentication

Install the mistralai Python package and configure the MISTRAL_API_KEY environment variable. The OCR API uses the same authentication as other Mistral endpoints.

Key considerations:

  • Same installation and authentication as other Mistral API endpoints
  • OCR is also available through the Azure SDK (mistralai_azure)

Step 2: Initialize the Mistral Client

Create a Mistral client instance. The OCR resource is available as client.ocr.

Key considerations:

  • Use context manager pattern for resource management
  • The same client instance can be used for chat, embeddings, and OCR

Step 3: Prepare Document Input

Specify the document to process either as a URL (for remote documents) or as an uploaded file. The document specification includes the source type, URL or file content, and an optional document name.

Key considerations:

  • Documents can be specified by URL (type: document_url) or by file upload
  • Supported formats include PDF and images
  • The document_name parameter helps identify the document in results
  • For file uploads, use the Files API to upload first, then reference the file ID

Step 4: Call the OCR API

Invoke the ocr.process() method with the document specification, OCR model identifier, and optional parameters such as include_image_base64 to request base64-encoded images in the response.

Key considerations:

  • Use model="mistral-ocr-latest" for the OCR model
  • Set include_image_base64=True to receive embedded images as base64 strings
  • The document parameter is a dictionary with type, document_url (or file reference), and document_name
  • Processing is synchronous; the response is returned when OCR is complete

Step 5: Process OCR Results

Parse the OCR response which contains page-by-page results. Each page includes extracted markdown text, detected tables (with optional format specification), extracted images with bounding boxes, and page dimension metadata.

Key considerations:

  • Response contains pages as a list of OCRPageObject entries
  • Each page has: markdown text, images, tables, and dimensions
  • Tables include bounding boxes and can be in markdown format
  • Images include bounding box coordinates and optional base64 data
  • Usage information includes document pages and tokens processed
  • Use model_dump_json() to serialize the full response as JSON

Execution Diagram

GitHub URL

Workflow Repository