Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:EvolvingLMMs Lab Lmms eval Task Selection

From Leeroopedia
Knowledge Sources
Domains Evaluation, Task_Management
Last Updated 2026-02-14 00:00 GMT

Overview

Task selection is the process of resolving user-supplied task name patterns into a concrete set of evaluation task configurations that the framework will execute.

Description

Evaluation frameworks typically support hundreds of benchmark tasks, each defined by a YAML configuration file specifying dataset paths, output types, metrics, and prompt templates. Users need a way to select subsets of these tasks without listing every individual task name. Task selection addresses this by supporting glob-style pattern matching against a pre-built task index.

The lmms-eval framework organizes tasks into three categories:

  • Tasks -- Individual benchmark configurations (e.g., mmmu_val, mme).
  • Groups -- Named collections of tasks defined in group YAML configs that aggregate results.
  • Tags -- Labels attached to tasks in their YAML configs, enabling cross-cutting selections (e.g., all vision-language tasks).

The TaskManager class is responsible for:

  1. Indexing -- Walking the lmms_eval/tasks/ directory tree (and any user-supplied include_path) to discover and classify all YAML configuration files.
  2. Matching -- Taking user-supplied patterns (e.g., mmmu*) and expanding them against the full task index using fnmatch glob semantics.
  3. Loading -- Instantiating ConfigurableTask or ConfigurableMessagesTask objects from the matched YAML configs.

Usage

Use task selection whenever:

  • You want to run a specific subset of benchmarks: --tasks mmmu,mme.
  • You want to run all tasks matching a pattern: --tasks "mmmu*".
  • You want to include tasks from a custom directory: --include_path /path/to/custom/tasks.
  • You are building tooling that needs to enumerate available tasks programmatically.

Theoretical Basis

Task selection follows a two-phase resolution pattern:

Phase 1 -- Indexing:

For each YAML file in the task directories:

  1. Parse the YAML in "simple" mode (without resolving !function constructors).
  2. Classify as one of: task, python_task, group, or tag.
  3. Store in a dictionary mapping name -> {type, yaml_path, task_list}.

Phase 2 -- Matching:

Given a list of user-supplied patterns P = [p1, p2, ...] and the set of all indexed task names T:

matched = sorted({t in T : exists p in P such that fnmatch(t, p)})

This uses Python's fnmatch.filter() which supports *, ?, and [seq] wildcards, consistent with Unix shell glob semantics.

After matching, a duplicate check ensures no task appears in multiple competing groups, which would create ambiguity in how group-level overrides (like num_fewshot) are applied.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment