Principle:EvolvingLMMs Lab Lmms eval Model Type Selection

Knowledge Sources	lmms-eval
Domains	Model_Architecture, Evaluation
Last Updated	2026-02-14 00:00 GMT

Overview

Choosing between the simple and chat model protocols determines how evaluation tasks provide inputs to a multimodal model.

Description

When integrating a custom model into the lmms-eval framework, the first architectural decision is whether the model follows the simple protocol or the chat protocol. This choice is expressed through a single boolean class attribute, is_simple, defined on the lmms base class.

Simple models (is_simple = True) receive raw visual data and textual context through the legacy task interface. Tasks call doc_to_visual to supply images, videos, or audio as raw objects, and the model's generate_until method receives a tuple of (contexts, gen_kwargs, doc_to_visual, doc_id, task, split). This protocol is well-suited for models that handle their own prompt formatting and media preprocessing.

Chat models (is_simple = False) receive structured ChatMessages objects through the doc_to_messages task interface. The model's generate_until method receives a tuple of (ctx, doc_to_messages, gen_kwargs, doc_id, task, split). This protocol aligns with modern conversational model APIs that expect role-annotated message lists with interleaved media content.

The framework enforces consistency between the declared type and the resolved model class during validation. If a model is resolved as "chat" but its class declares is_simple = True, a TypeError is raised. Similarly, if resolved as "simple" but is_simple = False, validation fails. This prevents silent mismatches between what the task provides and what the model expects.

The model type also influences which tasks are compatible. The evaluator reads lm.is_simple after instantiation and passes either "simple" or "chat" as the task_type when building the task dictionary. This ensures that each task formats its documents using the correct protocol for the loaded model.

Usage

Set is_simple = True on your model class when:

Your model expects raw image/video/audio objects and handles its own prompt construction.
You are wrapping a model that does not follow a conversational message format.
You want maximum control over how multimodal inputs are preprocessed.

Set is_simple = False on your model class when:

Your model uses a chat-style API with structured messages (e.g., HuggingFace chat templates, OpenAI message format).
The model benefits from the ChatMessages abstraction that provides to_hf_messages() and to_openai_messages() conversion.
You want interleaved text and media content delivered in a standardized message structure.

Theoretical Basis

The simple/chat distinction maps to two fundamental model interface paradigms in multimodal AI:

Completion-style interface (simple):

Input = (text_prompt, [image_1, image_2, ...], generation_kwargs)
Output = generated_text

Conversational-style interface (chat):

Input = [
    {"role": "system", "content": [...]},
    {"role": "user", "content": [text, image, video, ...]},
]
Output = generated_text

The framework resolves which interface to use through a two-step process:

Registry resolution: The ModelRegistryV2.resolve() method checks whether a chat class path exists for the model. If it does (and force_simple is not set), the model resolves as type "chat". Otherwise it resolves as "simple".
Class validation: _validate_model_class() confirms that the loaded class's is_simple attribute matches the resolved type, preventing runtime mismatches.

This design allows a single model ID to support both protocols by registering separate simple_class_path and chat_class_path entries in the model manifest, with the chat variant preferred by default.

Related Pages

Implemented By

Implementation:EvolvingLMMs_Lab_Lmms_eval_Lmms_Is_Simple

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment