Principle:EvolvingLMMs Lab Lmms eval Model Type Selection
| Knowledge Sources | |
|---|---|
| Domains | Model_Architecture, Evaluation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Choosing between the simple and chat model protocols determines how evaluation tasks provide inputs to a multimodal model.
Description
When integrating a custom model into the lmms-eval framework, the first architectural decision is whether the model follows the simple protocol or the chat protocol. This choice is expressed through a single boolean class attribute, is_simple, defined on the lmms base class.
Simple models (is_simple = True) receive raw visual data and textual context through the legacy task interface. Tasks call doc_to_visual to supply images, videos, or audio as raw objects, and the model's generate_until method receives a tuple of (contexts, gen_kwargs, doc_to_visual, doc_id, task, split). This protocol is well-suited for models that handle their own prompt formatting and media preprocessing.
Chat models (is_simple = False) receive structured ChatMessages objects through the doc_to_messages task interface. The model's generate_until method receives a tuple of (ctx, doc_to_messages, gen_kwargs, doc_id, task, split). This protocol aligns with modern conversational model APIs that expect role-annotated message lists with interleaved media content.
The framework enforces consistency between the declared type and the resolved model class during validation. If a model is resolved as "chat" but its class declares is_simple = True, a TypeError is raised. Similarly, if resolved as "simple" but is_simple = False, validation fails. This prevents silent mismatches between what the task provides and what the model expects.
The model type also influences which tasks are compatible. The evaluator reads lm.is_simple after instantiation and passes either "simple" or "chat" as the task_type when building the task dictionary. This ensures that each task formats its documents using the correct protocol for the loaded model.
Usage
Set is_simple = True on your model class when:
- Your model expects raw image/video/audio objects and handles its own prompt construction.
- You are wrapping a model that does not follow a conversational message format.
- You want maximum control over how multimodal inputs are preprocessed.
Set is_simple = False on your model class when:
- Your model uses a chat-style API with structured messages (e.g., HuggingFace chat templates, OpenAI message format).
- The model benefits from the
ChatMessagesabstraction that providesto_hf_messages()andto_openai_messages()conversion. - You want interleaved text and media content delivered in a standardized message structure.
Theoretical Basis
The simple/chat distinction maps to two fundamental model interface paradigms in multimodal AI:
Completion-style interface (simple):
Input = (text_prompt, [image_1, image_2, ...], generation_kwargs)
Output = generated_text
Conversational-style interface (chat):
Input = [
{"role": "system", "content": [...]},
{"role": "user", "content": [text, image, video, ...]},
]
Output = generated_text
The framework resolves which interface to use through a two-step process:
- Registry resolution: The
ModelRegistryV2.resolve()method checks whether a chat class path exists for the model. If it does (andforce_simpleis not set), the model resolves as type"chat". Otherwise it resolves as"simple". - Class validation:
_validate_model_class()confirms that the loaded class'sis_simpleattribute matches the resolved type, preventing runtime mismatches.
This design allows a single model ID to support both protocols by registering separate simple_class_path and chat_class_path entries in the model manifest, with the chat variant preferred by default.