Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Lmms Generate Until

From Leeroopedia
Knowledge Sources
Domains Evaluation, Model_Inference
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for dispatching evaluation requests to the model for inference, supporting generation and loglikelihood tasks, provided by the lmms-eval framework.

Description

The lmms class in lmms_eval/api/model.py is the abstract base class for all model implementations in the framework. It defines three abstract methods -- generate_until, loglikelihood, and generate_until_multi_round -- that every concrete model must implement.

The generate_until method receives a list of Instance objects and must return a list of generated strings. Each Instance's args contains the prompt context, generation kwargs (including stopping sequences, temperature, and sampling parameters), and a reference to the visual input loader.

The loglikelihood method receives a list of Instance objects, each containing a context-continuation pair, and returns log probabilities with greedy-match indicators.

The generate_until_multi_round method extends generation to multi-round dialogs where subsequent prompts can depend on the model's previous outputs.

The class also provides infrastructure for:

  • Caching -- A JSONL-based caching mechanism (LMMS_EVAL_USE_CACHE) that stores and retrieves responses to avoid redundant inference.
  • Distributed execution -- Rank and world-size tracking for multi-GPU evaluation.
  • Argument parsing -- The create_from_arg_string() classmethod for instantiating models from CLI argument strings.
  • Memory management -- The clean() method for freeing GPU memory after inference.

Usage

Use these methods when:

  • You are implementing a new model backend and need to conform to the evaluation interface.
  • You are running an evaluation and the evaluator dispatches requests via getattr(lm, reqtype)(reqs).
  • You need to understand the expected input/output contract for model inference.

Code Reference

Source Location

  • Repository: lmms-eval
  • File: lmms_eval/api/model.py
  • Lines: 253-270 (generate_until), 225-250 (loglikelihood), 272-289 (generate_until_multi_round)

Signature

class lmms(abc.ABC):
    is_simple: bool = True

    @abc.abstractmethod
    def generate_until(self, requests: list) -> List[str]:
        """Generate greedily until a stopping sequence.

        :param requests: list[Instance]
            Each Instance's args contains
            (context, generation_kwargs, doc_to_visual,
             doc_id, task, split).
        :return: list[str]
            A list of generated continuations.
        """
        pass

    @abc.abstractmethod
    def loglikelihood(
        self, requests: List[Instance]
    ) -> List[Tuple[float, bool]]:
        """Compute log-likelihood of generating a continuation
        from a context.

        :param requests: list[Instance]
            Each Instance's args contains
            (context, continuation, doc_to_visual,
             doc_id, task, split).
        :return: list[tuple[float, bool]]
            (logprob, is_greedy) pairs.
        """
        pass

    @abc.abstractmethod
    def generate_until_multi_round(
        self, requests: list
    ) -> List[str]:
        """Multi-round dialog generation.

        :param requests: list[Instance]
        :return: list[str]
        """
        pass

    @classmethod
    def create_from_arg_string(
        cls: Type[T],
        arg_string: str,
        additional_config: Optional[dict] = None,
    ) -> T:
        """Create model instance from key=value argument string."""
        ...

Import

from lmms_eval.api.model import lmms

I/O Contract

Inputs

Name Type Required Description
requests list[Instance] Yes List of Instance objects containing prompts, generation kwargs, visual input loaders, and metadata
request.arguments[0] str Yes The prompt context string (or message list for chat models)
request.arguments[1] dict Yes Generation kwargs including until (stop sequences), do_sample, temperature
request.arguments[2] Callable Yes doc_to_visual function that loads visual inputs for the document
request.arguments[3] int Yes Document ID within the evaluation split
request.arguments[4] str Yes Task name string
request.arguments[5] str Yes Split name (e.g., "test", "validation")

Outputs

Name Type Description
generate_until return List[str] List of generated text continuations, one per request
loglikelihood return List[Tuple[float, bool]] List of (log_probability, is_greedy) tuples, one per request
generate_until_multi_round return List[str] List of final-round generated text continuations

Usage Examples

Basic Example

from lmms_eval.api.model import lmms
from lmms_eval.api.instance import Instance

# Assuming a model instance `lm` is already created
# Dispatch is done via the evaluator:
reqtype = "generate_until"
resps = getattr(lm, reqtype)(cloned_reqs)

# Each response is appended to the request
for resp, req in zip(resps, cloned_reqs):
    req.resps.append(resp)

Implementing a New Model

from lmms_eval.api.model import lmms
from lmms_eval.api.instance import Instance
from typing import List, Tuple

class MyCustomModel(lmms):
    is_simple = True

    def __init__(self, pretrained: str, **kwargs):
        super().__init__()
        # Load your model here
        self.model = load_model(pretrained)

    def generate_until(
        self, requests: list
    ) -> List[str]:
        results = []
        for req in requests:
            context = req.arguments[0]
            gen_kwargs = req.arguments[1]
            visuals = req.arguments[2](
                self.task_dict[req.arguments[4]][req.arguments[3]]
            )
            output = self.model.generate(
                context, visuals, **gen_kwargs
            )
            results.append(output)
        return results

    def loglikelihood(
        self, requests: List[Instance]
    ) -> List[Tuple[float, bool]]:
        # Implement log-likelihood computation
        ...

    def generate_until_multi_round(
        self, requests: list
    ) -> List[str]:
        # Implement multi-round generation
        ...

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment