Implementation:Open compass VLMEvalKit Generate Inner
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Model_Architecture, Software_Design |
Overview
Interface specification for the user-implemented inference method that VLM adapters must define to process multimodal input.
Description
generate_inner() is an abstract method defined in both BaseModel (vlmeval/vlm/base.py:L45-47) and BaseAPI (vlmeval/api/base.py:L46-57). For BaseModel subclasses, the signature is generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str. For BaseAPI subclasses, the signature is generate_inner(self, inputs, **kwargs) -> Tuple[int, str, str]. The message list contains dicts with 'type' (text/image/video) and 'value' (content string or file path).
Usage
Every VLM adapter must implement this method. This is a Pattern Doc — there is no single concrete implementation, as each model adapter provides its own.
Code Reference
- Source:
vlmeval/vlm/base.py, Lines: L45-47 (BaseModel abstract);vlmeval/api/base.py, Lines: L46-57 (BaseAPI abstract) - Import: (abstract — implemented by subclasses)
Signature:
# For local VLM adapters (BaseModel):
@abstractmethod
def generate_inner(self, message: List[Dict], dataset: Optional[str] = None) -> str:
"""
Args:
message: List of dicts with 'type' and 'value' keys.
Types: 'text', 'image', 'video'
Values: text content, image file path, video file path
dataset: Optional dataset name for special handling.
Returns:
Prediction string.
"""
raise NotImplementedError
# For API model adapters (BaseAPI):
@abstractmethod
def generate_inner(self, inputs, **kwargs) -> Tuple[int, str, str]:
"""
Args:
inputs: Preprocessed message list.
**kwargs: Additional arguments.
Returns:
(ret_code, answer, log) where ret_code=0 means success.
"""
I/O Contract
| Direction | Description |
|---|---|
| Inputs | message/inputs — List[Dict] with type/value keys
|
| Outputs | BaseModel: str; BaseAPI: Tuple[int, str, str]
|
Usage Examples
# Example: Simple local VLM adapter
class SimpleVLM(BaseModel):
def generate_inner(self, message, dataset=None):
prompt = "\n".join(m['value'] for m in message if m['type'] == 'text')
images = [m['value'] for m in message if m['type'] == 'image']
return self.model.predict(prompt, images)
# Example: Simple API adapter
class SimpleAPI(BaseAPI):
def generate_inner(self, inputs, **kwargs):
try:
resp = requests.post(self.endpoint, json={"messages": inputs})
return 0, resp.json()["answer"], "ok"
except Exception as e:
return -1, "", str(e)