Implementation:EvolvingLMMs Lab Lmms eval Dataset Loading

Knowledge Sources	lmms-eval
Domains	Data_Processing, Evaluation
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for retrieving and preparing evaluation datasets provided by the lmms-eval framework.

Description

The ConfigurableTask.download() method is the primary entry point for dataset loading in lmms-eval. It wraps the HuggingFace datasets.load_dataset() call with retry logic, download configuration, and special handling for video datasets that may need to be fetched from YouTube or extracted from zip/tar archives.

After loading the raw dataset, the method optionally applies a process_docs function (specified in the task YAML via the !function directive) to each relevant split. It then creates a parallel dataset_no_image copy with all Image, Sequence[Image], and Audio columns removed, which is used for lightweight operations such as logging and serialization.

The method is decorated with @retry from the tenacity library, configured to retry up to 5 attempts or 60 seconds with a 2-second fixed wait between attempts, ensuring robustness against transient network failures during dataset download.

Usage

Use this when defining a custom task. Set dataset_path and optionally dataset_name in your YAML configuration. The download is triggered automatically during ConfigurableTask.__init__(). For datasets requiring special handling (video downloads, local disk loading), use dataset_kwargs in the YAML.

Code Reference

Source Location

Repository: lmms-eval
File: lmms_eval/api/task.py
Lines: 892-1103

Signature

@retry(stop=(stop_after_attempt(5) | stop_after_delay(60)), wait=wait_fixed(2))
def download(self, dataset_kwargs=None) -> None:

Import

from lmms_eval.api.task import ConfigurableTask
# download() is called internally during ConfigurableTask.__init__()

I/O Contract

Inputs

Name	Type	Required	Description
dataset_kwargs	`Optional[dict]`	No	Additional keyword arguments forwarded to `datasets.load_dataset()`. May contain special keys like `"video"`, `"From_YouTube"`, `"load_from_disk"`, `"builder_script"`, `"cache_dir"`, `"local_files_only"`.

The following fields are read from the task's YAML configuration (via self.config and self.DATASET_PATH / self.DATASET_NAME):

Name	Type	Required	Description
dataset_path	`str`	Yes	HuggingFace dataset repository identifier (e.g., `"lmms-lab/MME"`) or a local path.
dataset_name	`Optional[str]`	No	Name of a specific subset/configuration within the dataset repository.
process_docs	`Optional[Callable]`	No	A function referenced via `!function` in YAML that transforms a `Dataset` split. Applied to each split after loading.
test_split	`Optional[str]`	No	Name of the test split (e.g., `"test"`).
training_split	`Optional[str]`	No	Name of the training split.
validation_split	`Optional[str]`	No	Name of the validation split.
fewshot_split	`Optional[str]`	No	Name of the fewshot split.

Outputs

Name	Type	Description
self.dataset	`datasets.DatasetDict`	The loaded HuggingFace DatasetDict containing all splits with full media columns.
self.dataset_no_image	`datasets.DatasetDict`	A lightweight copy of the dataset with Image, Sequence[Image], and Audio columns removed.

Usage Examples

Basic Example

# In your task YAML file (e.g., lmms_eval/tasks/my_task/my_task.yaml):
# dataset_path: lmms-lab/MME
# dataset_name: null
# test_split: test
# process_docs: !function utils.my_preprocess

# The download happens automatically when the task is initialized.
# You do not call download() directly in normal usage.

# Equivalent internal call:
from lmms_eval.api.task import ConfigurableTask

task = ConfigurableTask(config={
    "task": "my_task",
    "dataset_path": "lmms-lab/MME",
    "test_split": "test",
    "output_type": "generate_until",
    "doc_to_text": "question",
    "doc_to_target": "answer",
})
# task.dataset is now a loaded DatasetDict
# task.dataset["test"] contains the test split documents

With process_docs Preprocessing

# In utils.py alongside your YAML:
def my_preprocess(dataset):
    """Filter and augment the dataset."""
    # Filter to only include rows with valid images
    dataset = dataset.filter(lambda x: x["image"] is not None)
    # Add a derived column
    dataset = dataset.map(lambda x: {
        "full_prompt": f"Question: {x['question']}\nAnswer:"
    })
    return dataset

# In your YAML:
# process_docs: !function utils.my_preprocess

With Video Dataset Loading

# For datasets with video content, use dataset_kwargs in YAML:
# dataset_path: lmms-lab/MyVideoDataset
# dataset_kwargs:
#   video: true
#   cache_dir: MyVideoDataset/videos

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment