Implementation:EvolvingLMMs Lab Lmms eval ConfigurableTask Init
| Knowledge Sources | |
|---|---|
| Domains | Configuration, Task_Management |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for constructing a fully initialized evaluation task from a YAML configuration dictionary provided by the lmms-eval framework.
Description
ConfigurableTask.__init__() is the constructor that transforms a parsed YAML configuration dictionary into a fully operational evaluation task object. It performs the following operations in sequence:
1. Configuration resolution: If the class has a preconfigured CONFIG attribute, it is used as the base and optionally updated with the provided config dict. Otherwise, a new TaskConfig is created from the config dict.
2. Version extraction: The task version is extracted from metadata.version if present.
3. Model-specific preparation: Per-model overrides for generation kwargs, target kwargs, and lmms_eval-specific kwargs are resolved based on the model_name.
4. Output type validation: The output_type is validated against the set of supported types: "loglikelihood", "multiple_choice", "generate_until", "generate_until_multi_round".
5. Metric resolution: The metric_list from the YAML is resolved into callable metric functions, aggregation functions, and higher-is-better flags. Custom metrics referenced via !function are resolved to callables. Built-in metrics are looked up in the METRIC_REGISTRY.
6. Dataset download: The download() method is called to fetch the dataset from HuggingFace Hub.
7. Filter setup: Any filter pipelines specified in filter_list are constructed.
8. Document validation: A test document is processed through doc_to_text, doc_to_target, and optionally doc_to_choice to verify the configuration is valid.
Usage
Use this when the framework instantiates tasks from YAML configurations. You typically do not call ConfigurableTask.__init__() directly; it is invoked by TaskManager._load_individual_task_or_group() during task loading. However, understanding its behavior is essential for debugging task configuration issues.
Code Reference
Source Location
- Repository: lmms-eval
- File:
lmms_eval/api/task.py - Lines: 699-815
Signature
class ConfigurableTask(Task):
VERSION = "Yaml"
OUTPUT_TYPE = None
CONFIG = None
def __init__(
self,
data_dir=None,
cache_dir=None,
download_mode=None,
config: Optional[dict] = None,
model_name: Optional[str] = None,
) -> None:
Import
from lmms_eval.api.task import ConfigurableTask
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_dir | Optional[str] |
No | Path to a local folder containing the task's data files for manually downloaded datasets. |
| cache_dir | Optional[str] |
No | Directory for reading/writing the task dataset cache. Defaults to the HuggingFace cache directory. |
| download_mode | Optional[datasets.DownloadMode] |
No | Controls how pre-existing downloads and data are handled (reuse, force redownload, etc.). |
| config | Optional[dict] |
Yes (unless CONFIG is set) |
A dictionary of task configuration fields matching the TaskConfig dataclass schema. Typically parsed from a YAML file.
|
| model_name | Optional[str] |
No | Name of the model being evaluated, used to select model-specific configuration overrides. |
Outputs
| Name | Type | Description |
|---|---|---|
| ConfigurableTask instance | ConfigurableTask |
A fully initialized task object with dataset loaded, metrics resolved, filters configured, and prompt functions ready. Key attributes: self.dataset, self.task_docs, self._metric_fn_list, self._aggregation_list, self._higher_is_better, self.OUTPUT_TYPE.
|
Usage Examples
Basic Example
# A minimal YAML configuration for MME:
# task: mme
# dataset_path: lmms-lab/MME
# test_split: test
# output_type: generate_until
# doc_to_visual: !function utils.mme_doc_to_visual
# doc_to_text: !function utils.mme_doc_to_text
# doc_to_target: "answer"
# generation_kwargs:
# max_new_tokens: 16
# temperature: 0
# process_results: !function utils.mme_process_results
# metric_list:
# - metric: mme_perception_score
# aggregation: !function utils.mme_aggregate_results
# higher_is_better: true
# This YAML is loaded and passed as a dict to ConfigurableTask:
from lmms_eval.api.task import ConfigurableTask
config = {
"task": "mme",
"dataset_path": "lmms-lab/MME",
"test_split": "test",
"output_type": "generate_until",
"doc_to_text": "question",
"doc_to_target": "answer",
"generation_kwargs": {
"max_new_tokens": 16,
"temperature": 0,
},
}
task = ConfigurableTask(config=config, model_name="llava")
print(task.OUTPUT_TYPE) # "generate_until"
print(len(task.task_docs)) # Number of test documents
With Template Inheritance
# _default_template.yaml (base config):
# output_type: generate_until
# generation_kwargs:
# max_new_tokens: 1024
# temperature: 0
# my_task.yaml (child config):
# include: _default_template.yaml
# task: my_task
# dataset_path: my-org/my-dataset
# test_split: test
# doc_to_text: !function utils.format_prompt
# doc_to_target: "answer"
# metric_list:
# - metric: exact_match
# aggregation: mean
# higher_is_better: true