Implementation:Open compass VLMEvalKit BaseModel
Appearance
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Model_Architecture, Software_Design |
Overview
Concrete abstract base class for all local VLM adapters in VLMEvalKit providing the unified inference interface.
Description
BaseModel in vlmeval/vlm/base.py defines the contract for local VLM adapters. It provides:
generate(message, dataset)which validates and preprocesses input messages then delegates to the abstractgenerate_inner(message, dataset)chat(messages, dataset)for multi-turn with automatic turn-dropping on failureuse_custom_prompt(dataset)returning False by default- Abstract
build_prompt(line, dataset)for custom formatting
Class attributes INTERLEAVE=False and allowed_types=['text', 'image', 'video'] declare capabilities. Helper methods like message_to_promptimg() and message_to_promptvideo() assist in format conversion.
Usage
Subclass this when adding a new local VLM to VLMEvalKit. Must implement generate_inner() at minimum.
Code Reference
- Source:
vlmeval/vlm/base.py, Lines: L6-221 - Import:
from vlmeval.vlm.base import BaseModel
Signature:
class BaseModel:
INTERLEAVE = False
allowed_types = ['text', 'image', 'video']
def __init__(self):
self.dump_image_func = None
def use_custom_prompt(self, dataset: str) -> bool: ...
@abstractmethod
def build_prompt(self, line, dataset: str): ...
@abstractmethod
def generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str: ...
def generate(self, message, dataset=None) -> str: ...
def chat(self, messages: List[Dict], dataset=None) -> str: ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | message — List[Dict] with keys 'type' ('text'/'image'/'video') and 'value'; dataset — optional dataset name string
|
| Outputs | generate() returns str prediction; chat() returns str response or failure message
|
Usage Examples
from vlmeval.vlm.base import BaseModel
class MyVLM(BaseModel):
INTERLEAVE = True # Supports interleaved image-text input
def __init__(self, model_path):
super().__init__()
# Load your model here
self.model = load_model(model_path)
def generate_inner(self, message, dataset=None):
# Convert message format and run inference
prompt = ""
images = []
for msg in message:
if msg['type'] == 'text':
prompt += msg['value']
elif msg['type'] == 'image':
images.append(msg['value'])
return self.model.generate(prompt, images)
def build_prompt(self, line, dataset):
# Optional: custom prompt for specific datasets
return [
dict(type='image', value=line['image']),
dict(type='text', value=f"Question: {line['question']}")
]
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment