Principle:Open compass VLMEvalKit VLM Adapter Pattern
| Field | Value |
|---|---|
| source | VLMEvalKit|https://github.com/open-compass/VLMEvalKit |
| domain | Vision, Model_Architecture, Software_Design |
| last_updated | 2026-02-14 00:00 GMT |
Overview
An adapter pattern that provides a unified interface for invoking diverse Vision-Language Model architectures through a common base class contract.
Description
VLMEvalKit defines BaseModel as the abstract base class for all local VLM adapters. Every VLM architecture (InternVL, LLaVA, Qwen2-VL, MiniCPM, etc.) implements a subclass that adapts the model's specific API to the framework's uniform interface. The key contract is:
generate_inner(message, dataset)for single-turn inference- Optional
use_custom_prompt(dataset)andbuild_prompt(line, dataset)for model-specific prompt formatting - Optional
chat_inner(messages, dataset)for multi-turn support
The generate() method handles input preprocessing and validation before delegating to generate_inner(). Class attributes INTERLEAVE and allowed_types declare model capabilities.
Usage
Use when integrating a new local VLM into VLMEvalKit. Subclass BaseModel, implement generate_inner() at minimum, and optionally override prompt building and chat methods.
Theoretical Basis
Adapter pattern (GoF) — wraps diverse VLM interfaces behind a uniform contract. Template Method pattern — generate() defines the algorithm skeleton (validate -> preprocess -> delegate to generate_inner()), letting subclasses override the core inference step.