Implementation:Open compass VLMEvalKit BaseModel

Field	Value
source	VLMEvalKit
domain	Vision, Model_Architecture, Software_Design

Overview

Concrete abstract base class for all local VLM adapters in VLMEvalKit providing the unified inference interface.

Description

BaseModel in vlmeval/vlm/base.py defines the contract for local VLM adapters. It provides:

generate(message, dataset) which validates and preprocesses input messages then delegates to the abstract generate_inner(message, dataset)
chat(messages, dataset) for multi-turn with automatic turn-dropping on failure
use_custom_prompt(dataset) returning False by default
Abstract build_prompt(line, dataset) for custom formatting

Class attributes INTERLEAVE=False and allowed_types=['text', 'image', 'video'] declare capabilities. Helper methods like message_to_promptimg() and message_to_promptvideo() assist in format conversion.

Usage

Subclass this when adding a new local VLM to VLMEvalKit. Must implement generate_inner() at minimum.

Code Reference

Source: vlmeval/vlm/base.py, Lines: L6-221
Import: from vlmeval.vlm.base import BaseModel

Signature:

class BaseModel:
    INTERLEAVE = False
    allowed_types = ['text', 'image', 'video']

    def __init__(self):
        self.dump_image_func = None

    def use_custom_prompt(self, dataset: str) -> bool: ...

    @abstractmethod
    def build_prompt(self, line, dataset: str): ...

    @abstractmethod
    def generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str: ...

    def generate(self, message, dataset=None) -> str: ...

    def chat(self, messages: List[Dict], dataset=None) -> str: ...

I/O Contract

Direction	Description
Inputs	message — `List[Dict]` with keys `'type'` (`'text'`/`'image'`/`'video'`) and `'value'`; dataset — optional dataset name string
Outputs	`generate()` returns `str` prediction; `chat()` returns `str` response or failure message

Usage Examples

from vlmeval.vlm.base import BaseModel

class MyVLM(BaseModel):
    INTERLEAVE = True  # Supports interleaved image-text input

    def __init__(self, model_path):
        super().__init__()
        # Load your model here
        self.model = load_model(model_path)

    def generate_inner(self, message, dataset=None):
        # Convert message format and run inference
        prompt = ""
        images = []
        for msg in message:
            if msg['type'] == 'text':
                prompt += msg['value']
            elif msg['type'] == 'image':
                images.append(msg['value'])
        return self.model.generate(prompt, images)

    def build_prompt(self, line, dataset):
        # Optional: custom prompt for specific datasets
        return [
            dict(type='image', value=line['image']),
            dict(type='text', value=f"Question: {line['question']}")
        ]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment