Heuristic:EvolvingLMMs Lab Lmms eval Truncated Image Handling

Knowledge Sources	lmms-eval Dataset quality issue with HuggingFaceM4/NoCaps
Domains	Data_Quality, Debugging
Last Updated	2026-02-14 00:00 GMT

Overview

Globally enabling PIL's LOAD_TRUNCATED_IMAGES flag to prevent crashes from malformed images in evaluation datasets.

Description

Some evaluation datasets hosted on HuggingFace Hub contain truncated or malformed image files. Notably, the HuggingFaceM4/NoCaps dataset has truncated images in its test split. By default, PIL (Pillow) raises an OSError when attempting to load a truncated image. The lmms-eval framework sets ImageFile.LOAD_TRUNCATED_IMAGES = True globally at module load time, which instructs PIL to load whatever data is available from truncated files rather than raising an error. This prevents entire evaluation runs from crashing due to a single bad image.

Usage

This heuristic is automatically applied — it is set globally when lmms_eval/api/task.py is imported. It affects all image loading throughout the framework. Users do not need to configure it.

The Insight (Rule of Thumb)

Action: Set ImageFile.LOAD_TRUNCATED_IMAGES = True before any image loading.
Value: Boolean flag set at module import time.
Trade-off: Truncated images may produce incorrect visual features (partial or corrupted content), but this is preferable to crashing the entire evaluation run.

Reasoning

Evaluation datasets can contain thousands to millions of images, and quality control varies. A single truncated image should not invalidate an entire benchmark run. The specific dataset mentioned in the code comment (HuggingFaceM4/NoCaps test split) is a widely-used captioning benchmark, making this workaround necessary for practical evaluation. The impact on metrics is minimal because truncated images are rare and the resulting partial image data typically still allows the model to produce some output.

Code evidence from lmms_eval/api/task.py:51-53:

# HuggingfaceM4/NoCaps contains truncated image in test split
# Include this inside code block to avoid error
ImageFile.LOAD_TRUNCATED_IMAGES = True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment