Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:EvolvingLMMs Lab Lmms eval Model Type Selection

From Leeroopedia
Knowledge Sources
Domains Model_Architecture, Evaluation
Last Updated 2026-02-14 00:00 GMT

Overview

Choosing between the simple and chat model protocols determines how evaluation tasks provide inputs to a multimodal model.

Description

When integrating a custom model into the lmms-eval framework, the first architectural decision is whether the model follows the simple protocol or the chat protocol. This choice is expressed through a single boolean class attribute, is_simple, defined on the lmms base class.

Simple models (is_simple = True) receive raw visual data and textual context through the legacy task interface. Tasks call doc_to_visual to supply images, videos, or audio as raw objects, and the model's generate_until method receives a tuple of (contexts, gen_kwargs, doc_to_visual, doc_id, task, split). This protocol is well-suited for models that handle their own prompt formatting and media preprocessing.

Chat models (is_simple = False) receive structured ChatMessages objects through the doc_to_messages task interface. The model's generate_until method receives a tuple of (ctx, doc_to_messages, gen_kwargs, doc_id, task, split). This protocol aligns with modern conversational model APIs that expect role-annotated message lists with interleaved media content.

The framework enforces consistency between the declared type and the resolved model class during validation. If a model is resolved as "chat" but its class declares is_simple = True, a TypeError is raised. Similarly, if resolved as "simple" but is_simple = False, validation fails. This prevents silent mismatches between what the task provides and what the model expects.

The model type also influences which tasks are compatible. The evaluator reads lm.is_simple after instantiation and passes either "simple" or "chat" as the task_type when building the task dictionary. This ensures that each task formats its documents using the correct protocol for the loaded model.

Usage

Set is_simple = True on your model class when:

  • Your model expects raw image/video/audio objects and handles its own prompt construction.
  • You are wrapping a model that does not follow a conversational message format.
  • You want maximum control over how multimodal inputs are preprocessed.

Set is_simple = False on your model class when:

  • Your model uses a chat-style API with structured messages (e.g., HuggingFace chat templates, OpenAI message format).
  • The model benefits from the ChatMessages abstraction that provides to_hf_messages() and to_openai_messages() conversion.
  • You want interleaved text and media content delivered in a standardized message structure.

Theoretical Basis

The simple/chat distinction maps to two fundamental model interface paradigms in multimodal AI:

Completion-style interface (simple):

Input = (text_prompt, [image_1, image_2, ...], generation_kwargs)
Output = generated_text

Conversational-style interface (chat):

Input = [
    {"role": "system", "content": [...]},
    {"role": "user", "content": [text, image, video, ...]},
]
Output = generated_text

The framework resolves which interface to use through a two-step process:

  1. Registry resolution: The ModelRegistryV2.resolve() method checks whether a chat class path exists for the model. If it does (and force_simple is not set), the model resolves as type "chat". Otherwise it resolves as "simple".
  2. Class validation: _validate_model_class() confirms that the loaded class's is_simple attribute matches the resolved type, preventing runtime mismatches.

This design allows a single model ID to support both protocols by registering separate simple_class_path and chat_class_path entries in the model manifest, with the chat variant preferred by default.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment