Implementation:Open compass VLMEvalKit GLMVisionWrapper
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, API_Integration |
Overview
GLMVisionWrapper provides a VLMEvalKit API adapter for Zhipu AI GLM vision-language models.
Description
GLMVisionWrapper inherits from BaseAPI and uses the ZhipuAI Python SDK to communicate with Zhipu's chat completion API. It encodes images to base64 for transmission, adds dataset-specific prompts (e.g., yes/no guidance for HallusionBench and POPE), and supports configurable max token output. Authentication uses the GLMV_API_KEY environment variable.
Usage
Use this adapter when evaluating Zhipu GLM vision models through the ZhipuAI API (obtainable at bigmodel.cn).
Code Reference
- Source:
vlmeval/api/glm_vision.py, Lines: L1-77 - Import:
from vlmeval.api.glm_vision import GLMVisionWrapper
Signature:
class GLMVisionWrapper(BaseAPI):
def __init__(self, model, retry=5, key=None, verbose=True,
system_prompt=None, max_tokens=4096, proxy=None,
**kwargs): ...
def generate_inner(self, inputs, **kwargs): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | message — text/image/video content list; model-specific params via kwargs |
| Outputs | generate() returns str prediction; generate_inner() returns (int, str, str) tuple |
Usage Examples
# Example instantiation
model = GLMVisionWrapper(model='glm-4v')
response = model.generate(message)