Heuristic:Mistralai Client python Stream File Uploads
| Knowledge Sources | |
|---|---|
| Domains | Optimization, File_Management |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
Use file streams instead of byte arrays for large file uploads to avoid memory exhaustion.
Description
The Mistral SDK file upload endpoints accept both byte arrays and file streams. For large files (training data, documents for OCR), loading the entire file into memory as bytes can cause out-of-memory errors, especially when processing multiple files concurrently. The SDK documentation explicitly recommends using streams for large files.
Usage
Apply this heuristic when uploading files to the Mistral API, particularly for fine-tuning training datasets or large documents for OCR processing. Any file larger than a few megabytes should be uploaded via stream.
The Insight (Rule of Thumb)
- Action: Use file stream objects (e.g., `open("file.jsonl", "rb")`) instead of reading the entire file into memory with `file.read()`.
- Value: Memory usage stays constant regardless of file size when streaming.
- Trade-off: None. Streaming is strictly better for large files and equivalent for small files.
Reasoning
From the SDK README documentation:
"For endpoints that handle file uploads bytes arrays can also be used. However, using streams is recommended for large files."
Training datasets for fine-tuning can be hundreds of megabytes. Reading these entirely into memory creates unnecessary pressure and can cause process termination in memory-constrained environments.
Code Evidence
From `README.md` (file upload tip):
# Recommended: Use stream for large files
with open("training_data.jsonl", "rb") as f:
uploaded = client.files.upload(
file=File(file_name="training_data.jsonl", content=f)
)
# NOT recommended for large files: loading into memory
content = open("training_data.jsonl", "rb").read()
uploaded = client.files.upload(
file=File(file_name="training_data.jsonl", content=content)
)