Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Task Utility Interface

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Task_Management
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for implementing task-specific data extraction and result processing functions provided by the lmms-eval framework.

Description

The lmms-eval framework defines a set of method interfaces on ConfigurableTask that dispatch to user-provided utility functions. These methods handle three types of resolution for their corresponding config values:

String resolution: If the config value is a string that matches a dataset column name, the method returns doc[column_name] directly. If it does not match a column name, it is treated as a Jinja2-style template and rendered with the document as context via utils.apply_template().

Callable resolution: If the config value was loaded from a !function YAML directive, it is a Python callable. The method invokes it, passing the document dict and optionally lmms_eval_specific_kwargs if configured.

Integer resolution: For doc_to_text and doc_to_target, an integer value is returned as-is, typically used as an index into choices.

The ChatMessage protocol (defined in lmms_eval/protocol.py) provides the structured message format used by doc_to_messages. Messages consist of a role ("user", "system", "assistant") and a list of content items, each typed as text, image, video, or audio. The ChatMessages class provides methods to convert between different message formats: to_hf_messages() for HuggingFace-style messages, to_openai_messages() for OpenAI API format, and extract_media() to separate media URLs from the message stream.

Usage

Use these interfaces when implementing a custom task's utils.py. Each function you write must conform to the expected signature. Reference them in your YAML with !function utils.function_name. The framework will call your functions automatically during the evaluation loop.

Code Reference

Source Location

  • Repository: lmms-eval
  • File: lmms_eval/api/task.py (lines 1324-1414 for doc_to_text, doc_to_target, doc_to_visual), lmms_eval/protocol.py (lines 17-178 for ChatMessage protocol)

Signature

# In ConfigurableTask (lmms_eval/api/task.py):
def doc_to_text(self, doc: dict) -> Union[int, str]:
    ...

def doc_to_target(self, doc: dict) -> Union[int, str, list]:
    ...

def doc_to_visual(self, doc: dict) -> Union[int, str, list]:
    ...

# In protocol.py:
class ChatMessage(BaseModel):
    role: Literal["user", "system", "assistant"]
    content: List[ChatContent]

class ChatMessages(BaseModel):
    messages: List[ChatMessage]

    def extract_media(self) -> Tuple[list, list, list]:
        ...

    def to_hf_messages(self, video_kwargs=None) -> list:
        ...

    def to_openai_messages(self, video_kwargs=None) -> list:
        ...

Import

# For implementing utility functions, no special import is needed.
# Your utils.py is automatically imported by the framework.

# For using the ChatMessage protocol:
from lmms_eval.protocol import ChatMessage, ChatMessages
from lmms_eval.protocol import ChatTextContent, ChatImageContent
from lmms_eval.protocol import ChatVideoContent, ChatAudioContent

I/O Contract

Inputs

Name Type Required Description
doc dict Yes A single dataset document represented as a dictionary. Keys correspond to dataset column names, values to the row's column values. For multimodal datasets, image columns contain PIL Image objects and video columns contain file paths.
lmms_eval_specific_kwargs Optional[dict] No Model-specific keyword arguments (e.g., pre_prompt, post_prompt) configured in YAML under lmms_eval_specific_kwargs. Passed as the second argument to doc_to_text and doc_to_visual when available.
results List[str] Yes (for process_results) A list of model output strings. Typically contains a single element for non-multi-round tasks.

Outputs

Name Type Description
doc_to_visual return List[Union[PIL.Image.Image, str]] A list of visual media items (PIL Images for image tasks, file path strings for video tasks). Always a list, even for single-image tasks.
doc_to_text return str The text prompt string to send to the model.
doc_to_messages return List[ChatMessage] A list of structured chat messages with typed content (text, image, video, audio).
process_results return Dict[str, Any] A dictionary mapping metric names to their per-sample values. Keys must match metric names in the task's metric_list.

Usage Examples

Basic Example

# utils.py for a simple VQA task
def my_doc_to_visual(doc):
    """Extract the image from the document."""
    return [doc["image"].convert("RGB")]


def my_doc_to_text(doc, lmms_eval_specific_kwargs=None):
    """Construct the prompt from the question field."""
    question = doc["question"]
    if lmms_eval_specific_kwargs:
        pre = lmms_eval_specific_kwargs.get("pre_prompt", "")
        post = lmms_eval_specific_kwargs.get("post_prompt", "")
        return f"{pre}{question}{post}"
    return question


def my_process_results(doc, results):
    """Score the model output against the ground truth."""
    pred = results[0].strip().lower()
    target = doc["answer"].strip().lower()
    score = 1.0 if pred == target else 0.0
    return {"exact_match": score}

With ChatMessage Protocol

# utils.py for a chat-based task using doc_to_messages
from lmms_eval.protocol import (
    ChatMessage,
    ChatTextContent,
    ChatImageContent,
)


def my_doc_to_messages(doc):
    """Build a structured chat message with image and text."""
    return [
        ChatMessage(
            role="user",
            content=[
                ChatImageContent(url=doc["image"]),
                ChatTextContent(text=doc["question"]),
            ],
        )
    ]

MME Example (Real-World)

# From lmms_eval/tasks/mme/utils.py
def mme_doc_to_visual(doc):
    return [doc["image"].convert("RGB")]


def mme_doc_to_text(doc, lmms_eval_specific_kwargs=None):
    question = doc["question"].strip()
    if "pre_prompt" in lmms_eval_specific_kwargs:
        question = f"{lmms_eval_specific_kwargs['pre_prompt']}{question}"
    if "post_prompt" in lmms_eval_specific_kwargs:
        question = f"{question}{lmms_eval_specific_kwargs['post_prompt']}"
    return question


def mme_process_results(doc, results):
    pred = results[0]
    pred_ans = parse_pred_ans(pred)
    gt_ans = doc["answer"].lower().strip().replace(".", "")
    score = 1.0 if pred_ans == gt_ans else 0.0
    category = doc["category"]
    key = "mme_perception_score" if category in PERCEPTION_CATEGORIES else "mme_cognition_score"
    return {key: {"question_id": doc["question_id"], "category": category, "score": score}}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment