Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Open compass VLMEvalKit VLM Adapter Pattern

From Leeroopedia
Field Value
source VLMEvalKit|https://github.com/open-compass/VLMEvalKit
domain Vision, Model_Architecture, Software_Design
last_updated 2026-02-14 00:00 GMT

Overview

An adapter pattern that provides a unified interface for invoking diverse Vision-Language Model architectures through a common base class contract.

Description

VLMEvalKit defines BaseModel as the abstract base class for all local VLM adapters. Every VLM architecture (InternVL, LLaVA, Qwen2-VL, MiniCPM, etc.) implements a subclass that adapts the model's specific API to the framework's uniform interface. The key contract is:

  1. generate_inner(message, dataset) for single-turn inference
  2. Optional use_custom_prompt(dataset) and build_prompt(line, dataset) for model-specific prompt formatting
  3. Optional chat_inner(messages, dataset) for multi-turn support

The generate() method handles input preprocessing and validation before delegating to generate_inner(). Class attributes INTERLEAVE and allowed_types declare model capabilities.

Usage

Use when integrating a new local VLM into VLMEvalKit. Subclass BaseModel, implement generate_inner() at minimum, and optionally override prompt building and chat methods.

Theoretical Basis

Adapter pattern (GoF) — wraps diverse VLM interfaces behind a uniform contract. Template Method patterngenerate() defines the algorithm skeleton (validate -> preprocess -> delegate to generate_inner()), letting subclasses override the core inference step.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment