Principle:EvolvingLMMs Lab Lmms eval Task Selection
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Task_Management |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Task selection is the process of resolving user-supplied task name patterns into a concrete set of evaluation task configurations that the framework will execute.
Description
Evaluation frameworks typically support hundreds of benchmark tasks, each defined by a YAML configuration file specifying dataset paths, output types, metrics, and prompt templates. Users need a way to select subsets of these tasks without listing every individual task name. Task selection addresses this by supporting glob-style pattern matching against a pre-built task index.
The lmms-eval framework organizes tasks into three categories:
- Tasks -- Individual benchmark configurations (e.g.,
mmmu_val,mme). - Groups -- Named collections of tasks defined in group YAML configs that aggregate results.
- Tags -- Labels attached to tasks in their YAML configs, enabling cross-cutting selections (e.g., all vision-language tasks).
The TaskManager class is responsible for:
- Indexing -- Walking the
lmms_eval/tasks/directory tree (and any user-suppliedinclude_path) to discover and classify all YAML configuration files. - Matching -- Taking user-supplied patterns (e.g.,
mmmu*) and expanding them against the full task index usingfnmatchglob semantics. - Loading -- Instantiating
ConfigurableTaskorConfigurableMessagesTaskobjects from the matched YAML configs.
Usage
Use task selection whenever:
- You want to run a specific subset of benchmarks:
--tasks mmmu,mme. - You want to run all tasks matching a pattern:
--tasks "mmmu*". - You want to include tasks from a custom directory:
--include_path /path/to/custom/tasks. - You are building tooling that needs to enumerate available tasks programmatically.
Theoretical Basis
Task selection follows a two-phase resolution pattern:
Phase 1 -- Indexing:
For each YAML file in the task directories:
- Parse the YAML in "simple" mode (without resolving
!functionconstructors). - Classify as one of:
task,python_task,group, ortag. - Store in a dictionary mapping
name -> {type, yaml_path, task_list}.
Phase 2 -- Matching:
Given a list of user-supplied patterns P = [p1, p2, ...] and the set of all indexed task names T:
matched = sorted({t in T : exists p in P such that fnmatch(t, p)})
This uses Python's fnmatch.filter() which supports *, ?, and [seq] wildcards, consistent with Unix shell glob semantics.
After matching, a duplicate check ensures no task appears in multiple competing groups, which would create ambiguity in how group-level overrides (like num_fewshot) are applied.