Implementation:EvolvingLMMs Lab Lmms eval Lmms Generate Until
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Model_Inference |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for dispatching evaluation requests to the model for inference, supporting generation and loglikelihood tasks, provided by the lmms-eval framework.
Description
The lmms class in lmms_eval/api/model.py is the abstract base class for all model implementations in the framework. It defines three abstract methods -- generate_until, loglikelihood, and generate_until_multi_round -- that every concrete model must implement.
The generate_until method receives a list of Instance objects and must return a list of generated strings. Each Instance's args contains the prompt context, generation kwargs (including stopping sequences, temperature, and sampling parameters), and a reference to the visual input loader.
The loglikelihood method receives a list of Instance objects, each containing a context-continuation pair, and returns log probabilities with greedy-match indicators.
The generate_until_multi_round method extends generation to multi-round dialogs where subsequent prompts can depend on the model's previous outputs.
The class also provides infrastructure for:
- Caching -- A JSONL-based caching mechanism (
LMMS_EVAL_USE_CACHE) that stores and retrieves responses to avoid redundant inference. - Distributed execution -- Rank and world-size tracking for multi-GPU evaluation.
- Argument parsing -- The
create_from_arg_string()classmethod for instantiating models from CLI argument strings. - Memory management -- The
clean()method for freeing GPU memory after inference.
Usage
Use these methods when:
- You are implementing a new model backend and need to conform to the evaluation interface.
- You are running an evaluation and the evaluator dispatches requests via
getattr(lm, reqtype)(reqs). - You need to understand the expected input/output contract for model inference.
Code Reference
Source Location
- Repository: lmms-eval
- File:
lmms_eval/api/model.py - Lines: 253-270 (generate_until), 225-250 (loglikelihood), 272-289 (generate_until_multi_round)
Signature
class lmms(abc.ABC):
is_simple: bool = True
@abc.abstractmethod
def generate_until(self, requests: list) -> List[str]:
"""Generate greedily until a stopping sequence.
:param requests: list[Instance]
Each Instance's args contains
(context, generation_kwargs, doc_to_visual,
doc_id, task, split).
:return: list[str]
A list of generated continuations.
"""
pass
@abc.abstractmethod
def loglikelihood(
self, requests: List[Instance]
) -> List[Tuple[float, bool]]:
"""Compute log-likelihood of generating a continuation
from a context.
:param requests: list[Instance]
Each Instance's args contains
(context, continuation, doc_to_visual,
doc_id, task, split).
:return: list[tuple[float, bool]]
(logprob, is_greedy) pairs.
"""
pass
@abc.abstractmethod
def generate_until_multi_round(
self, requests: list
) -> List[str]:
"""Multi-round dialog generation.
:param requests: list[Instance]
:return: list[str]
"""
pass
@classmethod
def create_from_arg_string(
cls: Type[T],
arg_string: str,
additional_config: Optional[dict] = None,
) -> T:
"""Create model instance from key=value argument string."""
...
Import
from lmms_eval.api.model import lmms
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| requests | list[Instance] | Yes | List of Instance objects containing prompts, generation kwargs, visual input loaders, and metadata |
| request.arguments[0] | str | Yes | The prompt context string (or message list for chat models) |
| request.arguments[1] | dict | Yes | Generation kwargs including until (stop sequences), do_sample, temperature
|
| request.arguments[2] | Callable | Yes | doc_to_visual function that loads visual inputs for the document
|
| request.arguments[3] | int | Yes | Document ID within the evaluation split |
| request.arguments[4] | str | Yes | Task name string |
| request.arguments[5] | str | Yes | Split name (e.g., "test", "validation") |
Outputs
| Name | Type | Description |
|---|---|---|
| generate_until return | List[str] | List of generated text continuations, one per request |
| loglikelihood return | List[Tuple[float, bool]] | List of (log_probability, is_greedy) tuples, one per request |
| generate_until_multi_round return | List[str] | List of final-round generated text continuations |
Usage Examples
Basic Example
from lmms_eval.api.model import lmms
from lmms_eval.api.instance import Instance
# Assuming a model instance `lm` is already created
# Dispatch is done via the evaluator:
reqtype = "generate_until"
resps = getattr(lm, reqtype)(cloned_reqs)
# Each response is appended to the request
for resp, req in zip(resps, cloned_reqs):
req.resps.append(resp)
Implementing a New Model
from lmms_eval.api.model import lmms
from lmms_eval.api.instance import Instance
from typing import List, Tuple
class MyCustomModel(lmms):
is_simple = True
def __init__(self, pretrained: str, **kwargs):
super().__init__()
# Load your model here
self.model = load_model(pretrained)
def generate_until(
self, requests: list
) -> List[str]:
results = []
for req in requests:
context = req.arguments[0]
gen_kwargs = req.arguments[1]
visuals = req.arguments[2](
self.task_dict[req.arguments[4]][req.arguments[3]]
)
output = self.model.generate(
context, visuals, **gen_kwargs
)
results.append(output)
return results
def loglikelihood(
self, requests: List[Instance]
) -> List[Tuple[float, bool]]:
# Implement log-likelihood computation
...
def generate_until_multi_round(
self, requests: list
) -> List[str]:
# Implement multi-round generation
...