Implementation:EvolvingLMMs Lab Lmms eval ConfigurableTask Init

Knowledge Sources	lmms-eval
Domains	Configuration, Task_Management
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for constructing a fully initialized evaluation task from a YAML configuration dictionary provided by the lmms-eval framework.

Description

ConfigurableTask.__init__() is the constructor that transforms a parsed YAML configuration dictionary into a fully operational evaluation task object. It performs the following operations in sequence:

1. Configuration resolution: If the class has a preconfigured CONFIG attribute, it is used as the base and optionally updated with the provided config dict. Otherwise, a new TaskConfig is created from the config dict.

2. Version extraction: The task version is extracted from metadata.version if present.

3. Model-specific preparation: Per-model overrides for generation kwargs, target kwargs, and lmms_eval-specific kwargs are resolved based on the model_name.

4. Output type validation: The output_type is validated against the set of supported types: "loglikelihood", "multiple_choice", "generate_until", "generate_until_multi_round".

5. Metric resolution: The metric_list from the YAML is resolved into callable metric functions, aggregation functions, and higher-is-better flags. Custom metrics referenced via !function are resolved to callables. Built-in metrics are looked up in the METRIC_REGISTRY.

6. Dataset download: The download() method is called to fetch the dataset from HuggingFace Hub.

7. Filter setup: Any filter pipelines specified in filter_list are constructed.

8. Document validation: A test document is processed through doc_to_text, doc_to_target, and optionally doc_to_choice to verify the configuration is valid.

Usage

Use this when the framework instantiates tasks from YAML configurations. You typically do not call ConfigurableTask.__init__() directly; it is invoked by TaskManager._load_individual_task_or_group() during task loading. However, understanding its behavior is essential for debugging task configuration issues.

Code Reference

Source Location

Repository: lmms-eval
File: lmms_eval/api/task.py
Lines: 699-815

Signature

class ConfigurableTask(Task):
    VERSION = "Yaml"
    OUTPUT_TYPE = None
    CONFIG = None

    def __init__(
        self,
        data_dir=None,
        cache_dir=None,
        download_mode=None,
        config: Optional[dict] = None,
        model_name: Optional[str] = None,
    ) -> None:

Import

from lmms_eval.api.task import ConfigurableTask

I/O Contract

Inputs

Name	Type	Required	Description
data_dir	`Optional[str]`	No	Path to a local folder containing the task's data files for manually downloaded datasets.
cache_dir	`Optional[str]`	No	Directory for reading/writing the task dataset cache. Defaults to the HuggingFace cache directory.
download_mode	`Optional[datasets.DownloadMode]`	No	Controls how pre-existing downloads and data are handled (reuse, force redownload, etc.).
config	`Optional[dict]`	Yes (unless `CONFIG` is set)	A dictionary of task configuration fields matching the `TaskConfig` dataclass schema. Typically parsed from a YAML file.
model_name	`Optional[str]`	No	Name of the model being evaluated, used to select model-specific configuration overrides.

Outputs

Name	Type	Description
ConfigurableTask instance	`ConfigurableTask`	A fully initialized task object with dataset loaded, metrics resolved, filters configured, and prompt functions ready. Key attributes: `self.dataset`, `self.task_docs`, `self._metric_fn_list`, `self._aggregation_list`, `self._higher_is_better`, `self.OUTPUT_TYPE`.

Usage Examples

Basic Example

# A minimal YAML configuration for MME:
# task: mme
# dataset_path: lmms-lab/MME
# test_split: test
# output_type: generate_until
# doc_to_visual: !function utils.mme_doc_to_visual
# doc_to_text: !function utils.mme_doc_to_text
# doc_to_target: "answer"
# generation_kwargs:
#   max_new_tokens: 16
#   temperature: 0
# process_results: !function utils.mme_process_results
# metric_list:
#   - metric: mme_perception_score
#     aggregation: !function utils.mme_aggregate_results
#     higher_is_better: true

# This YAML is loaded and passed as a dict to ConfigurableTask:
from lmms_eval.api.task import ConfigurableTask

config = {
    "task": "mme",
    "dataset_path": "lmms-lab/MME",
    "test_split": "test",
    "output_type": "generate_until",
    "doc_to_text": "question",
    "doc_to_target": "answer",
    "generation_kwargs": {
        "max_new_tokens": 16,
        "temperature": 0,
    },
}
task = ConfigurableTask(config=config, model_name="llava")
print(task.OUTPUT_TYPE)  # "generate_until"
print(len(task.task_docs))  # Number of test documents

With Template Inheritance

# _default_template.yaml (base config):
# output_type: generate_until
# generation_kwargs:
#   max_new_tokens: 1024
#   temperature: 0

# my_task.yaml (child config):
# include: _default_template.yaml
# task: my_task
# dataset_path: my-org/my-dataset
# test_split: test
# doc_to_text: !function utils.format_prompt
# doc_to_target: "answer"
# metric_list:
#   - metric: exact_match
#     aggregation: mean
#     higher_is_better: true

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment