Principle:EvolvingLMMs Lab Lmms eval Task Selection

Knowledge Sources	lmms-eval
Domains	Evaluation, Task_Management
Last Updated	2026-02-14 00:00 GMT

Overview

Task selection is the process of resolving user-supplied task name patterns into a concrete set of evaluation task configurations that the framework will execute.

Description

Evaluation frameworks typically support hundreds of benchmark tasks, each defined by a YAML configuration file specifying dataset paths, output types, metrics, and prompt templates. Users need a way to select subsets of these tasks without listing every individual task name. Task selection addresses this by supporting glob-style pattern matching against a pre-built task index.

The lmms-eval framework organizes tasks into three categories:

Tasks -- Individual benchmark configurations (e.g., mmmu_val, mme).
Groups -- Named collections of tasks defined in group YAML configs that aggregate results.
Tags -- Labels attached to tasks in their YAML configs, enabling cross-cutting selections (e.g., all vision-language tasks).

The TaskManager class is responsible for:

Indexing -- Walking the lmms_eval/tasks/ directory tree (and any user-supplied include_path) to discover and classify all YAML configuration files.
Matching -- Taking user-supplied patterns (e.g., mmmu*) and expanding them against the full task index using fnmatch glob semantics.
Loading -- Instantiating ConfigurableTask or ConfigurableMessagesTask objects from the matched YAML configs.

Usage

Use task selection whenever:

You want to run a specific subset of benchmarks: --tasks mmmu,mme.
You want to run all tasks matching a pattern: --tasks "mmmu*".
You want to include tasks from a custom directory: --include_path /path/to/custom/tasks.
You are building tooling that needs to enumerate available tasks programmatically.

Theoretical Basis

Task selection follows a two-phase resolution pattern:

Phase 1 -- Indexing:

For each YAML file in the task directories:

Parse the YAML in "simple" mode (without resolving !function constructors).
Classify as one of: task, python_task, group, or tag.
Store in a dictionary mapping name -> {type, yaml_path, task_list}.

Phase 2 -- Matching:

Given a list of user-supplied patterns P = [p1, p2, ...] and the set of all indexed task names T:

matched = sorted({t in T : exists p in P such that fnmatch(t, p)})

This uses Python's fnmatch.filter() which supports *, ?, and [seq] wildcards, consistent with Unix shell glob semantics.

After matching, a duplicate check ensures no task appears in multiple competing groups, which would create ambiguity in how group-level overrides (like num_fewshot) are applied.

Related Pages

Implemented By

Implementation:EvolvingLMMs_Lab_Lmms_eval_TaskManager_Match_Tasks

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment