Implementation:Open compass VLMEvalKit DoubaoVLWrapper
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, API_Integration |
Overview
DoubaoVLWrapper provides a VLMEvalKit API adapter for ByteDance Doubao vision-language models via the Volcengine Ark API.
Description
DoubaoVLWrapper inherits from BaseAPI and uses the OpenAI-compatible client to communicate with the Volcengine Ark API endpoint. It supports model-specific endpoint configuration through environment variables (DOUBAO_VL_ENDPOINT_*), handles image dumping and base64 encoding, and provides custom prompt construction for MCQ datasets. Authentication requires the DOUBAO_VL_KEY environment variable.
Usage
Use this adapter when evaluating Doubao vision-language models (such as Doubao-1.5-vision-pro) through the Volcengine Ark API.
Code Reference
- Source:
vlmeval/api/doubao_vl_api.py, Lines: L1-210 - Import:
from vlmeval.api.doubao_vl_api import DoubaoVLWrapper
Signature:
class DoubaoVLWrapper(BaseAPI):
def __init__(self, model='', retry=5, verbose=True, system_prompt=None,
temperature=0, timeout=60, max_tokens=4096,
api_base='https://ark.cn-beijing.volces.com/api/v3',
**kwargs): ...
def generate_inner(self, inputs, **kwargs): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | message — text/image/video content list; model-specific params via kwargs |
| Outputs | generate() returns str prediction; generate_inner() returns (int, str, str) tuple |
Usage Examples
# Example instantiation
model = DoubaoVLWrapper(model='Doubao-1.5-vision-pro')
response = model.generate(message)