Principle:Open compass VLMEvalKit Generate Inner Interface
| Field | Value |
|---|---|
| source | VLMEvalKit|https://github.com/open-compass/VLMEvalKit |
| domain | Vision, Model_Architecture, Software_Design |
| last_updated | 2026-02-14 00:00 GMT |
Overview
An abstract method interface that model adapters must implement to perform single-turn VLM inference on preprocessed multimodal input.
Description
The generate_inner() method is the core extension point for VLM adapters in VLMEvalKit. For local models (BaseModel subclasses), it receives a list of message dicts (already validated and preprocessed by generate()) and must return a prediction string. For API models (BaseAPI subclasses), it receives the same input but must return a (ret_code, answer, log) tuple. This separation of concerns means adapter authors only need to focus on the model-specific inference logic — input validation, retry logic, and output processing are handled by the base classes.
Usage
Implement this method in every new VLM adapter. For BaseModel: return a string. For BaseAPI: return a (int, str, str) tuple where ret_code 0 means success.
Theoretical Basis
Template Method pattern — the base class generate() defines the overall algorithm and delegates the variable step to generate_inner(). This ensures consistent preprocessing and error handling across all adapters.