Principle:EvolvingLMMs Lab Lmms eval YAML Task Configuration

Knowledge Sources	lmms-eval
Domains	Configuration, Task_Management
Last Updated	2026-02-14 00:00 GMT

Overview

Evaluation tasks should be specified declaratively through structured configuration files rather than through imperative code, enabling rapid task creation without modifying framework internals.

Description

Declarative configuration is a central design principle of lmms-eval. Rather than requiring users to write Python classes for every new benchmark, the framework allows tasks to be defined entirely through YAML files. A single YAML file specifies everything the framework needs to know: where to find the data, how to construct prompts, what output format to expect, and how to score results.

This approach has several advantages:

Accessibility: Researchers who are not framework developers can add new benchmarks by writing YAML and, optionally, small utility functions. No understanding of the evaluation loop internals is required.

Composability: YAML configurations support template inheritance via the include directive. A base template can define shared settings (generation parameters, common metric configurations), and individual task YAMLs can override only the fields that differ. This reduces duplication across related benchmarks.

Transparency: Because the configuration is a flat, human-readable file, it is easy to review, version-control, and share. The exact evaluation protocol for any task can be understood by reading its YAML.

The YAML configuration maps directly to the TaskConfig dataclass, which defines all supported fields. The most important fields fall into several categories:

Data fields: dataset_path, dataset_name, dataset_kwargs, test_split, validation_split, training_split, fewshot_split.

Prompt fields: doc_to_text, doc_to_visual, doc_to_target, doc_to_choice, doc_to_messages. These can be column names (strings), Jinja2 templates, or !function references to Python callables.

Output and generation fields: output_type (one of "generate_until", "loglikelihood", "multiple_choice", "generate_until_multi_round"), generation_kwargs (temperature, max tokens, etc.).

Metric fields: metric_list (a list of metric configurations), process_results (a custom result processing function).

Model-specific fields: lmms_eval_specific_kwargs, model_specific_generation_kwargs, model_specific_target_kwargs for per-model overrides.

The !function YAML tag is a custom constructor that resolves a string like utils.my_function to the actual Python callable at load time by importing from the task's companion utils.py module.

Usage

Use YAML task configuration whenever you create a new evaluation task. Start by identifying the closest existing task YAML as a template, copy it into a new task directory, and modify the fields to match your new benchmark. For benchmarks that require custom prompt construction or result processing, implement the necessary functions in a utils.py file and reference them with !function directives.

Theoretical Basis

The YAML configuration system implements a mapping from declarative specification to an executable task object:

YAML File --> TaskConfig Dataclass --> ConfigurableTask Instance

The TaskConfig dataclass defines the schema:

@dataclass
class TaskConfig(dict):
    task: str = None
    dataset_path: str = None
    dataset_name: str = None
    output_type: str = "generate_until"
    doc_to_text: Union[Callable, str] = None
    doc_to_visual: Union[Callable, str] = None
    doc_to_target: Union[Callable, str] = None
    doc_to_messages: Callable = None
    process_results: Union[Callable, str] = None
    metric_list: list = None
    generation_kwargs: dict = None
    # ... additional fields

The resolution of !function references follows the pattern:

"!function utils.my_func" --> import utils from task directory --> getattr(utils, "my_func")

Template inheritance works through the include key:

child_config = merge(load(include_path), child_yaml_fields)

Where child fields take precedence over included base fields, following a last-writer-wins merge strategy.

Related Pages

Implemented By

Implementation:EvolvingLMMs_Lab_Lmms_eval_ConfigurableTask_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment