Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Mistralai Client python Stream File Uploads

From Leeroopedia
Knowledge Sources
Domains Optimization, File_Management
Last Updated 2026-02-15 14:00 GMT

Overview

Use file streams instead of byte arrays for large file uploads to avoid memory exhaustion.

Description

The Mistral SDK file upload endpoints accept both byte arrays and file streams. For large files (training data, documents for OCR), loading the entire file into memory as bytes can cause out-of-memory errors, especially when processing multiple files concurrently. The SDK documentation explicitly recommends using streams for large files.

Usage

Apply this heuristic when uploading files to the Mistral API, particularly for fine-tuning training datasets or large documents for OCR processing. Any file larger than a few megabytes should be uploaded via stream.

The Insight (Rule of Thumb)

  • Action: Use file stream objects (e.g., `open("file.jsonl", "rb")`) instead of reading the entire file into memory with `file.read()`.
  • Value: Memory usage stays constant regardless of file size when streaming.
  • Trade-off: None. Streaming is strictly better for large files and equivalent for small files.

Reasoning

From the SDK README documentation:

"For endpoints that handle file uploads bytes arrays can also be used. However, using streams is recommended for large files."

Training datasets for fine-tuning can be hundreds of megabytes. Reading these entirely into memory creates unnecessary pressure and can cause process termination in memory-constrained environments.

Code Evidence

From `README.md` (file upload tip):

# Recommended: Use stream for large files
with open("training_data.jsonl", "rb") as f:
    uploaded = client.files.upload(
        file=File(file_name="training_data.jsonl", content=f)
    )

# NOT recommended for large files: loading into memory
content = open("training_data.jsonl", "rb").read()
uploaded = client.files.upload(
    file=File(file_name="training_data.jsonl", content=content)
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment