Implementation:Open compass VLMEvalKit Generate Inner

Field	Value
source	VLMEvalKit
domain	Vision, Model_Architecture, Software_Design

Overview

Interface specification for the user-implemented inference method that VLM adapters must define to process multimodal input.

Description

generate_inner() is an abstract method defined in both BaseModel (vlmeval/vlm/base.py:L45-47) and BaseAPI (vlmeval/api/base.py:L46-57). For BaseModel subclasses, the signature is generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str. For BaseAPI subclasses, the signature is generate_inner(self, inputs, **kwargs) -> Tuple[int, str, str]. The message list contains dicts with 'type' (text/image/video) and 'value' (content string or file path).

Usage

Every VLM adapter must implement this method. This is a Pattern Doc — there is no single concrete implementation, as each model adapter provides its own.

Code Reference

Source: vlmeval/vlm/base.py, Lines: L45-47 (BaseModel abstract); vlmeval/api/base.py, Lines: L46-57 (BaseAPI abstract)
Import: (abstract — implemented by subclasses)

Signature:

# For local VLM adapters (BaseModel):
@abstractmethod
def generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str:
    """
    Args:
        message: List of dicts with 'type' and 'value' keys.
                 Types: 'text', 'image', 'video'
                 Values: text content, image file path, video file path
        dataset: Optional dataset name for special handling.
    Returns:
        Prediction string.
    """
    raise NotImplementedError

# For API model adapters (BaseAPI):
@abstractmethod
def generate_inner(self, inputs, **kwargs) -> Tuple[int, str, str]:
    """
    Args:
        inputs: Preprocessed message list.
        **kwargs: Additional arguments.
    Returns:
        (ret_code, answer, log) where ret_code=0 means success.
    """

I/O Contract

Direction	Description
Inputs	message/inputs — `List[Dict]` with `type`/`value` keys
Outputs	BaseModel: `str`; BaseAPI: `Tuple[int, str, str]`

Usage Examples

# Example: Simple local VLM adapter
class SimpleVLM(BaseModel):
    def generate_inner(self, message, dataset=None):
        prompt = "\n".join(m['value'] for m in message if m['type'] == 'text')
        images = [m['value'] for m in message if m['type'] == 'image']
        return self.model.predict(prompt, images)

# Example: Simple API adapter
class SimpleAPI(BaseAPI):
    def generate_inner(self, inputs, **kwargs):
        try:
            resp = requests.post(self.endpoint, json={"messages": inputs})
            return 0, resp.json()["answer"], "ok"
        except Exception as e:
            return -1, "", str(e)

Related Pages

Principle:Open_compass_VLMEvalKit_Generate_Inner_Interface

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment